13 KiB
CONTRACT-RICHGRAPH-V1-015: Reachability Graph Schema
Status: Published Version: 1.0.0 Published: 2025-12-05 Owners: Scanner Guild, Signals Guild, BE-Base Platform Guild Unblocks: GRAPH-CAS-401-001, GAP-SYM-007, SCAN-REACH-401-009, SCANNER-NATIVE-401-015, SYMS-SERVER-401-011, SYMS-CLIENT-401-012, SYMS-INGEST-401-013, SIGNALS-RUNTIME-401-002, GAP-REP-004, and 40+ downstream tasks
Overview
This contract defines the canonical richgraph-v1 schema used for function-level reachability analysis, CAS storage, and DSSE attestation. It specifies the data model, hash algorithms, determinism rules, and CAS layout enabling provable reachability claims.
Schema Definition
richgraph-v1 Document Structure
{
"schema": "richgraph-v1",
"analyzer": {
"name": "scanner.reachability",
"version": "0.1.0",
"toolchain_digest": "sha256:..."
},
"nodes": [
{
"id": "sym:java:base64url...",
"symbol_id": "sym:java:base64url...",
"lang": "java",
"kind": "method",
"display": "com.example.Foo.bar(String)",
"code_id": "code:java:base64url...",
"code_block_hash": "sha256:deadbeef...",
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
"purl": "pkg:maven/com.example/foo@1.0.0",
"build_id": "gnu-build-id:...",
"symbol_digest": "sha256:...",
"evidence": ["import", "disasm"],
"attributes": {"key": "value"}
}
],
"edges": [
{
"from": "sym:java:...",
"to": "sym:java:...",
"kind": "call",
"purl": "pkg:maven/com.example/bar@2.0.0",
"symbol_digest": "sha256:...",
"confidence": 0.9,
"evidence": ["reloc", "runtime"],
"candidates": []
}
],
"roots": [
{
"id": "sym:java:...",
"phase": "runtime",
"source": "main"
}
]
}
Node Schema
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Unique node identifier (typically same as symbol_id) |
symbol_id |
string | Yes | Canonical SymbolID (format: sym:{lang}:{base64url-sha256}) |
lang |
string | Yes | Language: java, dotnet, go, node, rust, python, ruby, php, binary, shell |
kind |
string | Yes | Symbol kind: method, function, class, module, trait, struct |
display |
string | No | Human-readable demangled name |
code_id |
string | No | CodeID for name-less symbols (format: code:{lang}:{base64url-sha256}) |
code_block_hash |
string | No | Hash of the code block for stripped/heuristic nodes (algorithm-prefixed hex) |
purl |
string | No | Package URL of containing package |
build_id |
string | No | GNU build-id, PE GUID, or Mach-O UUID |
symbol_digest |
string | No | SHA-256 of the symbol_id (format: sha256:{hex}) |
symbol |
object | No | Symbol metadata {mangled?, demangled?, source?, confidence?} with source ∈ {DWARF,PDB,SYM,NONE} and confidence in [0,1] |
evidence |
string[] | No | Evidence sources (sorted): import, reloc, disasm, runtime |
attributes |
object | No | Additional key-value metadata (sorted by key) |
Edge Schema
| Field | Type | Required | Description |
|---|---|---|---|
from |
string | Yes | Source node ID |
to |
string | Yes | Target node ID |
kind |
string | Yes | Edge type: call, virtual, indirect, data, init |
purl |
string | No | Package URL of callee |
symbol_digest |
string | No | SHA-256 of callee symbol_id |
confidence |
number | Yes | Confidence [0.0-1.0]: certain=1.0, high=0.9, medium=0.6, low=0.3 |
evidence |
string[] | No | Evidence sources (sorted) |
candidates |
string[] | No | Alternative resolution candidates (sorted) |
Root Schema
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Node ID designated as entry point |
phase |
string | Yes | Execution phase: runtime, load, init, test |
source |
string | No | Entry point source (e.g., main, DT_INIT, .ctors) |
Hash Algorithms
Summary
| Component | Algorithm | Format | Example |
|---|---|---|---|
| graph_hash | BLAKE3-256 | blake3:{hex} |
blake3:a1b2c3d4... |
| symbol_digest | SHA-256 | sha256:{hex} |
sha256:e5f6a7b8... |
| symbol_id fragment | SHA-256 | base64url-no-pad | sym:java:abc123... |
| code_id fragment | SHA-256 | base64url-no-pad | code:java:xyz789... |
Graph Hash (BLAKE3-256)
The graph hash provides content-addressable identification:
graph_hash = "blake3:" + hex(BLAKE3-256(canonical_json_bytes))
Rationale: BLAKE3 chosen for:
- Speed (3x+ faster than SHA-256 on modern CPUs)
- Parallelizable for large graphs
- Cryptographic security equivalent to SHA-256
- Consistent with internal content-addressing standard
Symbol Digest (SHA-256)
Symbol digests use SHA-256 for interoperability:
symbol_digest = "sha256:" + hex(SHA-256(utf8(symbol_id)))
SymbolID and CodeID Fragments
Internal fragments use SHA-256 with base64url encoding:
fragment = base64url_no_pad(SHA-256(utf8(canonical_tuple)))
symbol_id = "sym:{lang}:{fragment}"
code_id = "code:{lang}:{fragment}"
Determinism Rules
All outputs must be reproducible. The Trimmed() operation enforces canonical ordering:
Ordering Rules
- Nodes: Sort by
id(ordinal string comparison) - Edges: Sort by
(from, to, kind)in that order (ordinal) - Roots: Sort by
id(ordinal) - Evidence arrays: Sort alphabetically (ordinal)
- Candidates arrays: Sort alphabetically (ordinal)
- Attributes objects: Sort keys alphabetically (ordinal)
Normalization Rules
- Trim whitespace: All string values trimmed
- Empty to null: Empty strings become null/omitted
- Confidence clamping: Values clamped to [0.0, 1.0]
- Default values:
kinddefaults to"call"for edgesphasedefaults to"runtime"for rootsanalyzer.namedefaults to"scanner.reachability"analyzer.versiondefaults to"0.1.0"
JSON Serialization
- No indentation (compact JSON)
- Keys sorted alphabetically at all levels
- No trailing whitespace
- UTF-8 encoding
- No BOM
CAS Layout
Graph Storage
cas://reachability/graphs/{blake3} # Graph body (canonical JSON)
cas://reachability/graphs/{blake3}.dsse # DSSE envelope
Edge Bundle Storage (Optional)
For runtime hits, init-array roots, and contested edges:
cas://reachability/edges/{graph_hash}/{bundle_id} # Edge bundle body
cas://reachability/edges/{graph_hash}/{bundle_id}.dsse # DSSE envelope
Metadata Storage
{output_root}/reachability_graphs/{analysis_id}/richgraph-v1.json # Graph body
{output_root}/reachability_graphs/{analysis_id}/meta.json # Metadata
meta.json structure:
{
"schema": "richgraph-v1",
"graph_hash": "blake3:...",
"files": [
{"path": "...", "hash": "blake3:..."}
]
}
DSSE Integration
Predicate Types
| Predicate | Purpose |
|---|---|
stella.ops/graph@v1 |
Graph-level attestation |
stella.ops/edgeBundle@v1 |
Edge bundle attestation |
Graph DSSE (Mandatory)
Every richgraph-v1 document requires a DSSE envelope:
{
"payloadType": "application/vnd.stellaops.graph+json",
"payload": "<base64(canonical_graph_json)>",
"signatures": [...]
}
Subject: cas://reachability/graphs/{blake3}
Rekor Integration
- Graph DSSE: Always publish to Rekor (or mirror when offline)
- Edge Bundle DSSE: Optional, capped at configurable limit per graph
SymbolID Construction
Format
sym:{lang}:{base64url_sha256_no_pad}
Per-Language Canonical Tuples
| Language | Tuple Components (NUL-separated) |
|---|---|
| Java | {package}\0{class}\0{method}\0{descriptor} (lowercased) |
| .NET | {assembly}\0{namespace}\0{type}\0{member_signature} |
| Go | {module}\0{package}\0{receiver}\0{func} |
| Node/Deno | {pkg_or_path}\0{export_path}\0{kind} |
| Rust | {crate}\0{module}\0{item}\0{mangled?} |
| Python | {pkg_or_path}\0{module}\0{qualified_name} |
| Ruby | {gem_or_path}\0{module}\0{method} |
| PHP | {composer_pkg}\0{namespace}\0{qualified_name} |
| Binary | {file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?} |
| Shell | {script_rel_path}\0{function_or_cmd} |
| Swift | {module}\0{type}\0{member}\0{mangled?} |
CodeID Construction
Format
code:{lang}:{base64url_sha256_no_pad}
Use Cases
CodeIDs provide stable identifiers when symbol names are unavailable:
- Stripped binaries:
code:binary:{hash}from{format}\0{file_hash}\0{addr}\0{length}\0{section}\0{code_block_hash} - .NET modules:
code:dotnet:{hash}from{assembly}\0{module}\0{mvid} - Node packages:
code:node:{hash}from{package}\0{entry_path}
Implementation Status
Existing Implementation
| Component | Location | Status |
|---|---|---|
| RichGraph model | src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraph.cs |
Implemented |
| SymbolId builder | src/Scanner/__Libraries/StellaOps.Scanner.Reachability/SymbolId.cs |
Implemented |
| CodeId builder | src/Scanner/__Libraries/StellaOps.Scanner.Reachability/CodeId.cs |
Implemented |
| RichGraphWriter | src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraphWriter.cs |
Needs BLAKE3 |
| DSSE predicates | src/Signer/StellaOps.Signer/PredicateTypes.cs |
Implemented |
Required Changes
| Change | Priority | Notes |
|---|---|---|
| Update RichGraphWriter to use BLAKE3 | P0 | Currently uses SHA256 for graph_hash |
Add meta.json hash prefix |
P1 | Use blake3: prefix |
| CAS adapter for graph storage | P1 | Implement cas://reachability/graphs/{blake3} paths |
Decision Checklist
This contract resolves the following decisions from the 2025-12-02 alignment meeting:
| Decision | Choice | Rationale |
|---|---|---|
| Graph hash algorithm | BLAKE3-256 | Speed + security |
| Symbol digest algorithm | SHA-256 | Interoperability |
| CAS path scheme | cas://reachability/graphs/{blake3} |
Content-addressable |
| DSSE required for graphs | Yes (mandatory) | Provenance chain |
| DSSE for edge bundles | Optional (capped) | Rekor volume control |
| JSON canonicalization | Sorted keys, compact | Determinism |
| Hash prefix format | {alg}:{hex} |
Explicit algorithm ID |
Validation Rules
Schema Validation
schemamust equal"richgraph-v1"nodesarray must not be empty- All node
idvalues must be unique - All edge
from/tomust reference existing nodes - All root
idvalues must reference existing nodes confidencemust be in range [0.0, 1.0]
Hash Validation
graph_hashmust match BLAKE3-256 of canonical JSONsymbol_digestmust match SHA-256 ofsymbol_id- SymbolID fragments must match SHA-256 of canonical tuple
Migration Path
From Current Implementation
- RichGraphWriter: Replace
ComputeSha256withComputeBlake3for graph hash - meta.json: Update hash format from
sha256:toblake3: - Existing graphs: Recompute hashes on next scan (no migration needed)
Compatibility
- Symbol digests remain SHA-256 (no change)
- SymbolID format unchanged
- CodeID format unchanged
Reference Implementation
Canonical JSON Writer
// From RichGraph.cs - Trimmed() enforces canonical ordering
public RichGraph Trimmed()
{
var nodes = Nodes.OrderBy(n => n.Id, StringComparer.Ordinal).ToList();
var edges = Edges
.OrderBy(e => e.From, StringComparer.Ordinal)
.ThenBy(e => e.To, StringComparer.Ordinal)
.ThenBy(e => e.Kind, StringComparer.Ordinal)
.ToList();
var roots = Roots.OrderBy(r => r.Id, StringComparer.Ordinal).ToList();
return this with { Nodes = nodes, Edges = edges, Roots = roots };
}
BLAKE3 Graph Hash (Required Update)
// Replace in RichGraphWriter.cs
private static string ComputeBlake3(byte[] bytes)
{
using var blake3 = Blake3.Hasher.New();
blake3.Update(bytes);
var hash = blake3.Finalize();
return "blake3:" + Convert.ToHexString(hash.AsSpan()).ToLowerInvariant();
}
Related Contracts
- Sealed Mode - Air-gap operation with CAS
- Mirror Bundle - Offline transport format
- Verification Policy - DSSE verification rules
- Scanner Surface - Surface analysis framework
Changelog
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-12-05 | Scanner Guild | Initial contract from alignment meeting |