Some checks failed
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
102 lines
5.2 KiB
Markdown
102 lines
5.2 KiB
Markdown
# Reachability Evidence Schema (Draft v1, Nov 2026)
|
|
|
|
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
|
|
|
|
## 1. Core identifiers
|
|
|
|
- `symbol_id`: canonical ID for a function/symbol; includes `{format, build_id?, file_hash?, section?, addr, length}` plus optional `code_block_hash`. Always deterministic and lowercase.
|
|
- `code_id`: `{format, build_id?, file_hash?, start, length, code_block_hash?}`; used when symbol names are absent.
|
|
- `symbol_digest`: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.
|
|
- `purl`: package URL of the owning component (from SBOM resolver); `pkg:unknown` when unresolved.
|
|
|
|
## 2. Graph payload (`richgraph-v1` additions)
|
|
|
|
```jsonc
|
|
{
|
|
"nodes": [
|
|
{
|
|
"id": "sym:sha256:...",
|
|
"symbol_id": "func:ELF:sha256:...",
|
|
"code_id": "code:ELF:sha256:...",
|
|
"code_block_hash": "sha256:deadbeef...",
|
|
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
|
|
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
|
|
"build_id": "a1b2c3...",
|
|
"lang": "c",
|
|
"evidence": ["dwarf", "dynsym"],
|
|
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
|
|
}
|
|
],
|
|
"edges": [
|
|
{
|
|
"from": "sym:sha256:caller",
|
|
"to": "sym:sha256:callee",
|
|
"kind": "direct|plt|indirect|runtime",
|
|
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
|
|
"symbol_digest": "sha256:...", // callee digest
|
|
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
|
|
"confidence": 0.92,
|
|
"evidence": ["import", "reloc@GOT"]
|
|
}
|
|
],
|
|
"roots": [
|
|
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
|
|
{ "id": "main", "phase": "runtime" }
|
|
],
|
|
"graph_hash": "blake3:..."
|
|
}
|
|
```
|
|
|
|
## 2.5 Attestation levels (hybrid default)
|
|
|
|
- **Graph DSSE (required):** one DSSE envelope over the canonical graph JSON (sorted arrays/keys) with `graph_hash` = BLAKE3 of body; Rekor publish always (or mirror when offline).
|
|
- **Edge-bundle DSSE (optional):** batches of ≤512 edges, emitted only for high-signal cases (`runtime`, `init_array`/TLS roots, contested/third-party edges). Each bundle carries `graph_hash`, `bundle_reason`, per-edge `reason`, `symbol_digest`, `purl`, `confidence`, and optional `revoked=true` for quarantine. Rekor publish is configurable; CAS storage is mandatory.
|
|
- CAS layout additions:
|
|
- Graph body: `cas://reachability/graphs/{blake3}`
|
|
- Graph DSSE: `cas://reachability/graphs/{blake3}.dsse`
|
|
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}` + `.dsse`
|
|
- Determinism: bundle ordering by `(bundle_reason, edge_id)`; arrays sorted before hashing.
|
|
|
|
## 3. Runtime facts (Signals ingestion)
|
|
|
|
Fields per NDJSON event:
|
|
|
|
- `symbolId` (required), `codeId`, `symbolDigest?`, `purl?`
|
|
- `hitCount`, `observedAt`, `loaderBase`, `processId`, `processName`, `containerId`, `socketAddress?`
|
|
- `callgraphId` or `scanId`, plus `evidenceUri` (CAS) if trace stored externally
|
|
- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
|
|
|
|
## 4. Unknowns registry payload
|
|
|
|
See `docs/signals/unknowns-registry.md`; reachability producers emit Unknowns when:
|
|
- symbol→purl unresolved,
|
|
- call edge target unresolved,
|
|
- build-id missing for ELF and file hash used instead.
|
|
|
|
Unknowns must include `unknown_type`, `scope`, `provenance`, `confidence.p`, and `labels`.
|
|
|
|
## 5. CAS layout
|
|
|
|
- Graphs: `cas://reachability/graphs/{blake3}` (canonical JSON, sorted keys/arrays)
|
|
- Runtime traces: `cas://reachability/runtime/{sha256}`
|
|
- Unknowns evidence (optional large blobs): `cas://unknowns/{sha256}`
|
|
- Edge bundles: `cas://reachability/edges/{graph_hash}/{bundle_id}` (JSON + `.dsse`)
|
|
|
|
Metadata for each CAS object: `{ schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }`. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
|
|
|
|
## 6. Validation rules
|
|
|
|
- All edges must carry either `purl` or `candidates[]`; never leave both empty.
|
|
- If `build_id` present, `symbol_id` and `code_id` must store it; if absent, record `build_id_source: "FileHash"`.
|
|
- Evidence arrays sorted; confidence in [0,1].
|
|
- `code_block_hash` (when present) must be lowercase hex with an algorithm prefix (e.g., `sha256:`) and only accompany stripped/heuristic nodes.
|
|
- Roots must include load-time constructors when present.
|
|
- When `edge_bundles` are present, each edge in a bundle must also exist in the graph edge set; `revoked=true` bundles override graph edges for policy/scoring.
|
|
- Graph DSSE is mandatory per scan; edge-bundle DSSEs are optional but must reference `graph_hash` and `bundle_id`.
|
|
|
|
## 7. Acceptance checklist
|
|
|
|
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
|
|
- CAS writers enforce canonicalization before hashing.
|
|
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.
|