# CONTRACT-RICHGRAPH-V1-015: Reachability Graph Schema > **Status:** Published > **Version:** 1.0.0 > **Published:** 2025-12-05 > **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild > **Unblocks:** GRAPH-CAS-401-001, GAP-SYM-007, SCAN-REACH-401-009, SCANNER-NATIVE-401-015, SYMS-SERVER-401-011, SYMS-CLIENT-401-012, SYMS-INGEST-401-013, SIGNALS-RUNTIME-401-002, GAP-REP-004, and 40+ downstream tasks ## Overview This contract defines the canonical `richgraph-v1` schema used for function-level reachability analysis, CAS storage, and DSSE attestation. It specifies the data model, hash algorithms, determinism rules, and CAS layout enabling provable reachability claims. --- ## Schema Definition ### richgraph-v1 Document Structure ```json { "schema": "richgraph-v1", "analyzer": { "name": "scanner.reachability", "version": "0.1.0", "toolchain_digest": "sha256:..." }, "nodes": [ { "id": "sym:java:base64url...", "symbol_id": "sym:java:base64url...", "lang": "java", "kind": "method", "display": "com.example.Foo.bar(String)", "code_id": "code:java:base64url...", "code_block_hash": "sha256:deadbeef...", "symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 }, "purl": "pkg:maven/com.example/foo@1.0.0", "build_id": "gnu-build-id:...", "symbol_digest": "sha256:...", "evidence": ["import", "disasm"], "attributes": {"key": "value"} } ], "edges": [ { "from": "sym:java:...", "to": "sym:java:...", "kind": "call", "purl": "pkg:maven/com.example/bar@2.0.0", "symbol_digest": "sha256:...", "confidence": 0.9, "evidence": ["reloc", "runtime"], "candidates": [] } ], "roots": [ { "id": "sym:java:...", "phase": "runtime", "source": "main" } ] } ``` ### Node Schema | Field | Type | Required | Description | |-------|------|----------|-------------| | `id` | string | Yes | Unique node identifier (typically same as `symbol_id`) | | `symbol_id` | string | Yes | Canonical SymbolID (format: `sym:{lang}:{base64url-sha256}`) | | `lang` | string | Yes | Language: `java`, `dotnet`, `go`, `node`, `rust`, `python`, `ruby`, `php`, `binary`, `shell` | | `kind` | string | Yes | Symbol kind: `method`, `function`, `class`, `module`, `trait`, `struct` | | `display` | string | No | Human-readable demangled name | | `code_id` | string | No | CodeID for name-less symbols (format: `code:{lang}:{base64url-sha256}`) | | `code_block_hash` | string | No | Hash of the code block for stripped/heuristic nodes (algorithm-prefixed hex) | | `purl` | string | No | Package URL of containing package | | `build_id` | string | No | GNU build-id, PE GUID, or Mach-O UUID | | `symbol_digest` | string | No | SHA-256 of the symbol_id (format: `sha256:{hex}`) | | `symbol` | object | No | Symbol metadata `{mangled?, demangled?, source?, confidence?}` with `source ∈ {DWARF,PDB,SYM,NONE}` and confidence in [0,1] | | `evidence` | string[] | No | Evidence sources (sorted): `import`, `reloc`, `disasm`, `runtime` | | `attributes` | object | No | Additional key-value metadata (sorted by key) | ### Edge Schema | Field | Type | Required | Description | |-------|------|----------|-------------| | `from` | string | Yes | Source node ID | | `to` | string | Yes | Target node ID | | `kind` | string | Yes | Edge type: `call`, `virtual`, `indirect`, `data`, `init` | | `purl` | string | No | Package URL of callee | | `symbol_digest` | string | No | SHA-256 of callee symbol_id | | `confidence` | number | Yes | Confidence [0.0-1.0]: `certain`=1.0, `high`=0.9, `medium`=0.6, `low`=0.3 | | `evidence` | string[] | No | Evidence sources (sorted) | | `candidates` | string[] | No | Alternative resolution candidates (sorted) | ### Root Schema | Field | Type | Required | Description | |-------|------|----------|-------------| | `id` | string | Yes | Node ID designated as entry point | | `phase` | string | Yes | Execution phase: `runtime`, `load`, `init`, `test` | | `source` | string | No | Entry point source (e.g., `main`, `DT_INIT`, `.ctors`) | --- ## Hash Algorithms ### Summary | Component | Algorithm | Format | Example | |-----------|-----------|--------|---------| | **graph_hash** | BLAKE3-256 | `blake3:{hex}` | `blake3:a1b2c3d4...` | | **symbol_digest** | SHA-256 | `sha256:{hex}` | `sha256:e5f6a7b8...` | | **symbol_id fragment** | SHA-256 | base64url-no-pad | `sym:java:abc123...` | | **code_id fragment** | SHA-256 | base64url-no-pad | `code:java:xyz789...` | ### Graph Hash (BLAKE3-256) The graph hash provides content-addressable identification: ``` graph_hash = "blake3:" + hex(BLAKE3-256(canonical_json_bytes)) ``` **Rationale:** BLAKE3 chosen for: - Speed (3x+ faster than SHA-256 on modern CPUs) - Parallelizable for large graphs - Cryptographic security equivalent to SHA-256 - Consistent with internal content-addressing standard ### Symbol Digest (SHA-256) Symbol digests use SHA-256 for interoperability: ``` symbol_digest = "sha256:" + hex(SHA-256(utf8(symbol_id))) ``` ### SymbolID and CodeID Fragments Internal fragments use SHA-256 with base64url encoding: ``` fragment = base64url_no_pad(SHA-256(utf8(canonical_tuple))) symbol_id = "sym:{lang}:{fragment}" code_id = "code:{lang}:{fragment}" ``` --- ## Determinism Rules All outputs must be reproducible. The `Trimmed()` operation enforces canonical ordering: ### Ordering Rules 1. **Nodes:** Sort by `id` (ordinal string comparison) 2. **Edges:** Sort by `(from, to, kind)` in that order (ordinal) 3. **Roots:** Sort by `id` (ordinal) 4. **Evidence arrays:** Sort alphabetically (ordinal) 5. **Candidates arrays:** Sort alphabetically (ordinal) 6. **Attributes objects:** Sort keys alphabetically (ordinal) ### Normalization Rules 1. **Trim whitespace:** All string values trimmed 2. **Empty to null:** Empty strings become null/omitted 3. **Confidence clamping:** Values clamped to [0.0, 1.0] 4. **Default values:** - `kind` defaults to `"call"` for edges - `phase` defaults to `"runtime"` for roots - `analyzer.name` defaults to `"scanner.reachability"` - `analyzer.version` defaults to `"0.1.0"` ### JSON Serialization - No indentation (compact JSON) - Keys sorted alphabetically at all levels - No trailing whitespace - UTF-8 encoding - No BOM --- ## CAS Layout ### Graph Storage ``` cas://reachability/graphs/{blake3} # Graph body (canonical JSON) cas://reachability/graphs/{blake3}.dsse # DSSE envelope ``` ### Edge Bundle Storage (Optional) For runtime hits, init-array roots, and contested edges: ``` cas://reachability/edges/{graph_hash}/{bundle_id} # Edge bundle body cas://reachability/edges/{graph_hash}/{bundle_id}.dsse # DSSE envelope ``` ### Metadata Storage ``` {output_root}/reachability_graphs/{analysis_id}/richgraph-v1.json # Graph body {output_root}/reachability_graphs/{analysis_id}/meta.json # Metadata ``` **meta.json structure:** ```json { "schema": "richgraph-v1", "graph_hash": "blake3:...", "files": [ {"path": "...", "hash": "blake3:..."} ] } ``` --- ## DSSE Integration ### Predicate Types | Predicate | Purpose | |-----------|---------| | `stella.ops/graph@v1` | Graph-level attestation | | `stella.ops/edgeBundle@v1` | Edge bundle attestation | ### Graph DSSE (Mandatory) Every richgraph-v1 document requires a DSSE envelope: ```json { "payloadType": "application/vnd.stellaops.graph+json", "payload": "", "signatures": [...] } ``` **Subject:** `cas://reachability/graphs/{blake3}` ### Rekor Integration - **Graph DSSE:** Always publish to Rekor (or mirror when offline) - **Edge Bundle DSSE:** Optional, capped at configurable limit per graph --- ## SymbolID Construction ### Format ``` sym:{lang}:{base64url_sha256_no_pad} ``` ### Per-Language Canonical Tuples | Language | Tuple Components (NUL-separated) | |----------|----------------------------------| | Java | `{package}\0{class}\0{method}\0{descriptor}` (lowercased) | | .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | | Go | `{module}\0{package}\0{receiver}\0{func}` | | Node/Deno | `{pkg_or_path}\0{export_path}\0{kind}` | | Rust | `{crate}\0{module}\0{item}\0{mangled?}` | | Python | `{pkg_or_path}\0{module}\0{qualified_name}` | | Ruby | `{gem_or_path}\0{module}\0{method}` | | PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | | Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | | Shell | `{script_rel_path}\0{function_or_cmd}` | | Swift | `{module}\0{type}\0{member}\0{mangled?}` | --- ## CodeID Construction ### Format ``` code:{lang}:{base64url_sha256_no_pad} ``` ### Use Cases CodeIDs provide stable identifiers when symbol names are unavailable: - **Stripped binaries:** `code:binary:{hash}` from `{format}\0{file_hash}\0{addr}\0{length}\0{section}\0{code_block_hash}` - **.NET modules:** `code:dotnet:{hash}` from `{assembly}\0{module}\0{mvid}` - **Node packages:** `code:node:{hash}` from `{package}\0{entry_path}` --- ## Implementation Status ### Existing Implementation | Component | Location | Status | |-----------|----------|--------| | RichGraph model | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraph.cs` | Implemented | | SymbolId builder | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/SymbolId.cs` | Implemented | | CodeId builder | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/CodeId.cs` | Implemented | | RichGraphWriter | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraphWriter.cs` | **Needs BLAKE3** | | DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented | ### Required Changes | Change | Priority | Notes | |--------|----------|-------| | Update RichGraphWriter to use BLAKE3 | P0 | Currently uses SHA256 for graph_hash | | Add `meta.json` hash prefix | P1 | Use `blake3:` prefix | | CAS adapter for graph storage | P1 | Implement `cas://reachability/graphs/{blake3}` paths | --- ## Decision Checklist This contract resolves the following decisions from the 2025-12-02 alignment meeting: | Decision | Choice | Rationale | |----------|--------|-----------| | Graph hash algorithm | BLAKE3-256 | Speed + security | | Symbol digest algorithm | SHA-256 | Interoperability | | CAS path scheme | `cas://reachability/graphs/{blake3}` | Content-addressable | | DSSE required for graphs | Yes (mandatory) | Provenance chain | | DSSE for edge bundles | Optional (capped) | Rekor volume control | | JSON canonicalization | Sorted keys, compact | Determinism | | Hash prefix format | `{alg}:{hex}` | Explicit algorithm ID | --- ## Validation Rules ### Schema Validation 1. `schema` must equal `"richgraph-v1"` 2. `nodes` array must not be empty 3. All node `id` values must be unique 4. All edge `from`/`to` must reference existing nodes 5. All root `id` values must reference existing nodes 6. `confidence` must be in range [0.0, 1.0] ### Hash Validation 1. `graph_hash` must match BLAKE3-256 of canonical JSON 2. `symbol_digest` must match SHA-256 of `symbol_id` 3. SymbolID fragments must match SHA-256 of canonical tuple --- ## Migration Path ### From Current Implementation 1. **RichGraphWriter:** Replace `ComputeSha256` with `ComputeBlake3` for graph hash 2. **meta.json:** Update hash format from `sha256:` to `blake3:` 3. **Existing graphs:** Recompute hashes on next scan (no migration needed) ### Compatibility - Symbol digests remain SHA-256 (no change) - SymbolID format unchanged - CodeID format unchanged --- ## Reference Implementation ### Canonical JSON Writer ```csharp // From RichGraph.cs - Trimmed() enforces canonical ordering public RichGraph Trimmed() { var nodes = Nodes.OrderBy(n => n.Id, StringComparer.Ordinal).ToList(); var edges = Edges .OrderBy(e => e.From, StringComparer.Ordinal) .ThenBy(e => e.To, StringComparer.Ordinal) .ThenBy(e => e.Kind, StringComparer.Ordinal) .ToList(); var roots = Roots.OrderBy(r => r.Id, StringComparer.Ordinal).ToList(); return this with { Nodes = nodes, Edges = edges, Roots = roots }; } ``` ### BLAKE3 Graph Hash (Required Update) ```csharp // Replace in RichGraphWriter.cs private static string ComputeBlake3(byte[] bytes) { using var blake3 = Blake3.Hasher.New(); blake3.Update(bytes); var hash = blake3.Finalize(); return "blake3:" + Convert.ToHexString(hash.AsSpan()).ToLowerInvariant(); } ``` --- ## Related Contracts - [Sealed Mode](./sealed-mode.md) - Air-gap operation with CAS - [Mirror Bundle](./mirror-bundle.md) - Offline transport format - [Verification Policy](./verification-policy.md) - DSSE verification rules - [Scanner Surface](./scanner-surface.md) - Surface analysis framework --- ## Changelog | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2025-12-05 | Scanner Guild | Initial contract from alignment meeting |