Files
git.stella-ops.org/docs/reachability/purl-resolved-edges.md
master d519782a8f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
prep docs and service updates
2025-11-21 06:56:36 +00:00

3.6 KiB
Raw Permalink Blame History

PURL-Resolved Callgraph Edges (Nov 2026)

This note captures the required behavior for joining binary callgraphs with SBOM components using purl + symbol digest annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.

1. Goal

Annotate every call edge in richgraph-v1 with:

  • purl of the component that defines the callee, and
  • a stable symbol_digest (hash of normalized signature plus optional instruction fingerprint).

This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.

2. Data model additions

  • Node: SymbolNode gains purl and symbol_digest fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
  • Edge: CallEdge gains purl (callee owner) and symbol_digest; keep existing kind/evidence fields. When callee resolution is ambiguous, include candidates[] with ranked purls and set confidence accordingly.
  • Provenance: store analyzer fingerprint (analyzer, version, toolchain_digest) and graph hash in CAS metadata.

3. Producer rules

  1. Map callee → file → SBOM component. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit candidates[] and lower confidence.
  2. Compute symbol digest. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
  3. Attach to edges. For every call edge, set purl and symbol_digest. If callee is external but unresolved, emit purl:"pkg:unknown" and also write an Unknowns entry (see signals unknowns registry).
  4. Determinism. Sort nodes and edges before hashing; keep evidence arrays sorted (import, reloc, disasm, runtime). Graph hash uses BLAKE3 over canonical JSON.

4. Consumer rules

  • Signals: merge edges from many binaries by (purl, symbol_digest); keep multiple site entries. Store in call_edges with purl as the join key for SBOM overlays.
  • Policy/VEX: treat reachable if any entrypoint path hits a symbol_digest that matches an affected function for the CVE purl.
  • UI/CLI: display purl@version plus demangled name; show site offsets for debugging; show confidence when candidates were present.

5. SBOM join strategy

  1. Use purl from component resolver; if absent, fall back to build_id plus hash match and emit purl:"pkg:unknown".
  2. When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
  3. For runtime traces, attach the same symbol_digest so runtime hits boost confidence on the correct edge.

6. Acceptance tests

  • Imports-only: edge from binary main to pkg:deb/ubuntu/openssl@3.0.2 symbol_digest=sha256:... must appear without running disassembly.
  • Disassembly: direct call to internal function carries purl of the hosting binarys SBOM entry.
  • Ambiguity: when two candidate purls exist, graph stores candidates[2] and confidence < 1.
  • Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.

7. Deliverables

  • Update richgraph-v1 schema and DTOs (Scanner + Signals).
  • Persist purl/symbol_digest in Mongo call_edges and CAS manifests.
  • CLI: extend stella reachability upload-callgraph and stella graph explain to surface purl plus digest.
  • Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.