Files
git.stella-ops.org/docs/contracts/richgraph-v1.md
StellaOps Bot efaf3cb789
Some checks failed
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
up
2025-12-12 09:35:37 +02:00

13 KiB

CONTRACT-RICHGRAPH-V1-015: Reachability Graph Schema

Status: Published Version: 1.0.0 Published: 2025-12-05 Owners: Scanner Guild, Signals Guild, BE-Base Platform Guild Unblocks: GRAPH-CAS-401-001, GAP-SYM-007, SCAN-REACH-401-009, SCANNER-NATIVE-401-015, SYMS-SERVER-401-011, SYMS-CLIENT-401-012, SYMS-INGEST-401-013, SIGNALS-RUNTIME-401-002, GAP-REP-004, and 40+ downstream tasks

Overview

This contract defines the canonical richgraph-v1 schema used for function-level reachability analysis, CAS storage, and DSSE attestation. It specifies the data model, hash algorithms, determinism rules, and CAS layout enabling provable reachability claims.


Schema Definition

richgraph-v1 Document Structure

{
  "schema": "richgraph-v1",
  "analyzer": {
    "name": "scanner.reachability",
    "version": "0.1.0",
    "toolchain_digest": "sha256:..."
  },
  "nodes": [
    {
      "id": "sym:java:base64url...",
      "symbol_id": "sym:java:base64url...",
      "lang": "java",
      "kind": "method",
      "display": "com.example.Foo.bar(String)",
      "code_id": "code:java:base64url...",
      "code_block_hash": "sha256:deadbeef...",
      "symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
      "purl": "pkg:maven/com.example/foo@1.0.0",
      "build_id": "gnu-build-id:...",
      "symbol_digest": "sha256:...",
      "evidence": ["import", "disasm"],
      "attributes": {"key": "value"}
    }
  ],
  "edges": [
    {
      "from": "sym:java:...",
      "to": "sym:java:...",
      "kind": "call",
      "purl": "pkg:maven/com.example/bar@2.0.0",
      "symbol_digest": "sha256:...",
      "confidence": 0.9,
      "evidence": ["reloc", "runtime"],
      "candidates": []
    }
  ],
  "roots": [
    {
      "id": "sym:java:...",
      "phase": "runtime",
      "source": "main"
    }
  ]
}

Node Schema

Field Type Required Description
id string Yes Unique node identifier (typically same as symbol_id)
symbol_id string Yes Canonical SymbolID (format: sym:{lang}:{base64url-sha256})
lang string Yes Language: java, dotnet, go, node, rust, python, ruby, php, binary, shell
kind string Yes Symbol kind: method, function, class, module, trait, struct
display string No Human-readable demangled name
code_id string No CodeID for name-less symbols (format: code:{lang}:{base64url-sha256})
code_block_hash string No Hash of the code block for stripped/heuristic nodes (algorithm-prefixed hex)
purl string No Package URL of containing package
build_id string No GNU build-id, PE GUID, or Mach-O UUID
symbol_digest string No SHA-256 of the symbol_id (format: sha256:{hex})
symbol object No Symbol metadata {mangled?, demangled?, source?, confidence?} with source ∈ {DWARF,PDB,SYM,NONE} and confidence in [0,1]
evidence string[] No Evidence sources (sorted): import, reloc, disasm, runtime
attributes object No Additional key-value metadata (sorted by key)

Edge Schema

Field Type Required Description
from string Yes Source node ID
to string Yes Target node ID
kind string Yes Edge type: call, virtual, indirect, data, init
purl string No Package URL of callee
symbol_digest string No SHA-256 of callee symbol_id
confidence number Yes Confidence [0.0-1.0]: certain=1.0, high=0.9, medium=0.6, low=0.3
evidence string[] No Evidence sources (sorted)
candidates string[] No Alternative resolution candidates (sorted)

Root Schema

Field Type Required Description
id string Yes Node ID designated as entry point
phase string Yes Execution phase: runtime, load, init, test
source string No Entry point source (e.g., main, DT_INIT, .ctors)

Hash Algorithms

Summary

Component Algorithm Format Example
graph_hash BLAKE3-256 blake3:{hex} blake3:a1b2c3d4...
symbol_digest SHA-256 sha256:{hex} sha256:e5f6a7b8...
symbol_id fragment SHA-256 base64url-no-pad sym:java:abc123...
code_id fragment SHA-256 base64url-no-pad code:java:xyz789...

Graph Hash (BLAKE3-256)

The graph hash provides content-addressable identification:

graph_hash = "blake3:" + hex(BLAKE3-256(canonical_json_bytes))

Rationale: BLAKE3 chosen for:

  • Speed (3x+ faster than SHA-256 on modern CPUs)
  • Parallelizable for large graphs
  • Cryptographic security equivalent to SHA-256
  • Consistent with internal content-addressing standard

Symbol Digest (SHA-256)

Symbol digests use SHA-256 for interoperability:

symbol_digest = "sha256:" + hex(SHA-256(utf8(symbol_id)))

SymbolID and CodeID Fragments

Internal fragments use SHA-256 with base64url encoding:

fragment = base64url_no_pad(SHA-256(utf8(canonical_tuple)))
symbol_id = "sym:{lang}:{fragment}"
code_id = "code:{lang}:{fragment}"

Determinism Rules

All outputs must be reproducible. The Trimmed() operation enforces canonical ordering:

Ordering Rules

  1. Nodes: Sort by id (ordinal string comparison)
  2. Edges: Sort by (from, to, kind) in that order (ordinal)
  3. Roots: Sort by id (ordinal)
  4. Evidence arrays: Sort alphabetically (ordinal)
  5. Candidates arrays: Sort alphabetically (ordinal)
  6. Attributes objects: Sort keys alphabetically (ordinal)

Normalization Rules

  1. Trim whitespace: All string values trimmed
  2. Empty to null: Empty strings become null/omitted
  3. Confidence clamping: Values clamped to [0.0, 1.0]
  4. Default values:
    • kind defaults to "call" for edges
    • phase defaults to "runtime" for roots
    • analyzer.name defaults to "scanner.reachability"
    • analyzer.version defaults to "0.1.0"

JSON Serialization

  • No indentation (compact JSON)
  • Keys sorted alphabetically at all levels
  • No trailing whitespace
  • UTF-8 encoding
  • No BOM

CAS Layout

Graph Storage

cas://reachability/graphs/{blake3}          # Graph body (canonical JSON)
cas://reachability/graphs/{blake3}.dsse     # DSSE envelope

Edge Bundle Storage (Optional)

For runtime hits, init-array roots, and contested edges:

cas://reachability/edges/{graph_hash}/{bundle_id}       # Edge bundle body
cas://reachability/edges/{graph_hash}/{bundle_id}.dsse  # DSSE envelope

Metadata Storage

{output_root}/reachability_graphs/{analysis_id}/richgraph-v1.json  # Graph body
{output_root}/reachability_graphs/{analysis_id}/meta.json          # Metadata

meta.json structure:

{
  "schema": "richgraph-v1",
  "graph_hash": "blake3:...",
  "files": [
    {"path": "...", "hash": "blake3:..."}
  ]
}

DSSE Integration

Predicate Types

Predicate Purpose
stella.ops/graph@v1 Graph-level attestation
stella.ops/edgeBundle@v1 Edge bundle attestation

Graph DSSE (Mandatory)

Every richgraph-v1 document requires a DSSE envelope:

{
  "payloadType": "application/vnd.stellaops.graph+json",
  "payload": "<base64(canonical_graph_json)>",
  "signatures": [...]
}

Subject: cas://reachability/graphs/{blake3}

Rekor Integration

  • Graph DSSE: Always publish to Rekor (or mirror when offline)
  • Edge Bundle DSSE: Optional, capped at configurable limit per graph

SymbolID Construction

Format

sym:{lang}:{base64url_sha256_no_pad}

Per-Language Canonical Tuples

Language Tuple Components (NUL-separated)
Java {package}\0{class}\0{method}\0{descriptor} (lowercased)
.NET {assembly}\0{namespace}\0{type}\0{member_signature}
Go {module}\0{package}\0{receiver}\0{func}
Node/Deno {pkg_or_path}\0{export_path}\0{kind}
Rust {crate}\0{module}\0{item}\0{mangled?}
Python {pkg_or_path}\0{module}\0{qualified_name}
Ruby {gem_or_path}\0{module}\0{method}
PHP {composer_pkg}\0{namespace}\0{qualified_name}
Binary {file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}
Shell {script_rel_path}\0{function_or_cmd}
Swift {module}\0{type}\0{member}\0{mangled?}

CodeID Construction

Format

code:{lang}:{base64url_sha256_no_pad}

Use Cases

CodeIDs provide stable identifiers when symbol names are unavailable:

  • Stripped binaries: code:binary:{hash} from {format}\0{file_hash}\0{addr}\0{length}\0{section}\0{code_block_hash}
  • .NET modules: code:dotnet:{hash} from {assembly}\0{module}\0{mvid}
  • Node packages: code:node:{hash} from {package}\0{entry_path}

Implementation Status

Existing Implementation

Component Location Status
RichGraph model src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraph.cs Implemented
SymbolId builder src/Scanner/__Libraries/StellaOps.Scanner.Reachability/SymbolId.cs Implemented
CodeId builder src/Scanner/__Libraries/StellaOps.Scanner.Reachability/CodeId.cs Implemented
RichGraphWriter src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraphWriter.cs Needs BLAKE3
DSSE predicates src/Signer/StellaOps.Signer/PredicateTypes.cs Implemented

Required Changes

Change Priority Notes
Update RichGraphWriter to use BLAKE3 P0 Currently uses SHA256 for graph_hash
Add meta.json hash prefix P1 Use blake3: prefix
CAS adapter for graph storage P1 Implement cas://reachability/graphs/{blake3} paths

Decision Checklist

This contract resolves the following decisions from the 2025-12-02 alignment meeting:

Decision Choice Rationale
Graph hash algorithm BLAKE3-256 Speed + security
Symbol digest algorithm SHA-256 Interoperability
CAS path scheme cas://reachability/graphs/{blake3} Content-addressable
DSSE required for graphs Yes (mandatory) Provenance chain
DSSE for edge bundles Optional (capped) Rekor volume control
JSON canonicalization Sorted keys, compact Determinism
Hash prefix format {alg}:{hex} Explicit algorithm ID

Validation Rules

Schema Validation

  1. schema must equal "richgraph-v1"
  2. nodes array must not be empty
  3. All node id values must be unique
  4. All edge from/to must reference existing nodes
  5. All root id values must reference existing nodes
  6. confidence must be in range [0.0, 1.0]

Hash Validation

  1. graph_hash must match BLAKE3-256 of canonical JSON
  2. symbol_digest must match SHA-256 of symbol_id
  3. SymbolID fragments must match SHA-256 of canonical tuple

Migration Path

From Current Implementation

  1. RichGraphWriter: Replace ComputeSha256 with ComputeBlake3 for graph hash
  2. meta.json: Update hash format from sha256: to blake3:
  3. Existing graphs: Recompute hashes on next scan (no migration needed)

Compatibility

  • Symbol digests remain SHA-256 (no change)
  • SymbolID format unchanged
  • CodeID format unchanged

Reference Implementation

Canonical JSON Writer

// From RichGraph.cs - Trimmed() enforces canonical ordering
public RichGraph Trimmed()
{
    var nodes = Nodes.OrderBy(n => n.Id, StringComparer.Ordinal).ToList();
    var edges = Edges
        .OrderBy(e => e.From, StringComparer.Ordinal)
        .ThenBy(e => e.To, StringComparer.Ordinal)
        .ThenBy(e => e.Kind, StringComparer.Ordinal)
        .ToList();
    var roots = Roots.OrderBy(r => r.Id, StringComparer.Ordinal).ToList();
    return this with { Nodes = nodes, Edges = edges, Roots = roots };
}

BLAKE3 Graph Hash (Required Update)

// Replace in RichGraphWriter.cs
private static string ComputeBlake3(byte[] bytes)
{
    using var blake3 = Blake3.Hasher.New();
    blake3.Update(bytes);
    var hash = blake3.Finalize();
    return "blake3:" + Convert.ToHexString(hash.AsSpan()).ToLowerInvariant();
}


Changelog

Version Date Author Changes
1.0.0 2025-12-05 Scanner Guild Initial contract from alignment meeting