Some checks failed
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
5.2 KiB
5.2 KiB
Reachability Evidence Schema (Draft v1, Nov 2026)
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
1. Core identifiers
symbol_id: canonical ID for a function/symbol; includes{format, build_id?, file_hash?, section?, addr, length}plus optionalcode_block_hash. Always deterministic and lowercase.code_id:{format, build_id?, file_hash?, start, length, code_block_hash?}; used when symbol names are absent.symbol_digest: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.purl: package URL of the owning component (from SBOM resolver);pkg:unknownwhen unresolved.
2. Graph payload (richgraph-v1 additions)
{
"nodes": [
{
"id": "sym:sha256:...",
"symbol_id": "func:ELF:sha256:...",
"code_id": "code:ELF:sha256:...",
"code_block_hash": "sha256:deadbeef...",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
"build_id": "a1b2c3...",
"lang": "c",
"evidence": ["dwarf", "dynsym"],
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
}
],
"edges": [
{
"from": "sym:sha256:caller",
"to": "sym:sha256:callee",
"kind": "direct|plt|indirect|runtime",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
"symbol_digest": "sha256:...", // callee digest
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
"confidence": 0.92,
"evidence": ["import", "reloc@GOT"]
}
],
"roots": [
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
{ "id": "main", "phase": "runtime" }
],
"graph_hash": "blake3:..."
}
2.5 Attestation levels (hybrid default)
- Graph DSSE (required): one DSSE envelope over the canonical graph JSON (sorted arrays/keys) with
graph_hash= BLAKE3 of body; Rekor publish always (or mirror when offline). - Edge-bundle DSSE (optional): batches of ≤512 edges, emitted only for high-signal cases (
runtime,init_array/TLS roots, contested/third-party edges). Each bundle carriesgraph_hash,bundle_reason, per-edgereason,symbol_digest,purl,confidence, and optionalrevoked=truefor quarantine. Rekor publish is configurable; CAS storage is mandatory. - CAS layout additions:
- Graph body:
cas://reachability/graphs/{blake3} - Graph DSSE:
cas://reachability/graphs/{blake3}.dsse - Edge bundle:
cas://reachability/edges/{graph_hash}/{bundle_id}+.dsse
- Graph body:
- Determinism: bundle ordering by
(bundle_reason, edge_id); arrays sorted before hashing.
3. Runtime facts (Signals ingestion)
Fields per NDJSON event:
symbolId(required),codeId,symbolDigest?,purl?hitCount,observedAt,loaderBase,processId,processName,containerId,socketAddress?callgraphIdorscanId, plusevidenceUri(CAS) if trace stored externally- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
4. Unknowns registry payload
See docs/signals/unknowns-registry.md; reachability producers emit Unknowns when:
- symbol→purl unresolved,
- call edge target unresolved,
- build-id missing for ELF and file hash used instead.
Unknowns must include unknown_type, scope, provenance, confidence.p, and labels.
5. CAS layout
- Graphs:
cas://reachability/graphs/{blake3}(canonical JSON, sorted keys/arrays) - Runtime traces:
cas://reachability/runtime/{sha256} - Unknowns evidence (optional large blobs):
cas://unknowns/{sha256} - Edge bundles:
cas://reachability/edges/{graph_hash}/{bundle_id}(JSON +.dsse)
Metadata for each CAS object: { schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
6. Validation rules
- All edges must carry either
purlorcandidates[]; never leave both empty. - If
build_idpresent,symbol_idandcode_idmust store it; if absent, recordbuild_id_source: "FileHash". - Evidence arrays sorted; confidence in [0,1].
code_block_hash(when present) must be lowercase hex with an algorithm prefix (e.g.,sha256:) and only accompany stripped/heuristic nodes.- Roots must include load-time constructors when present.
- When
edge_bundlesare present, each edge in a bundle must also exist in the graph edge set;revoked=truebundles override graph edges for policy/scoring. - Graph DSSE is mandatory per scan; edge-bundle DSSEs are optional but must reference
graph_hashandbundle_id.
7. Acceptance checklist
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
- CAS writers enforce canonicalization before hashing.
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.