Some checks failed
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
419 lines
13 KiB
Markdown
419 lines
13 KiB
Markdown
# CONTRACT-RICHGRAPH-V1-015: Reachability Graph Schema
|
|
|
|
> **Status:** Published
|
|
> **Version:** 1.0.0
|
|
> **Published:** 2025-12-05
|
|
> **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild
|
|
> **Unblocks:** GRAPH-CAS-401-001, GAP-SYM-007, SCAN-REACH-401-009, SCANNER-NATIVE-401-015, SYMS-SERVER-401-011, SYMS-CLIENT-401-012, SYMS-INGEST-401-013, SIGNALS-RUNTIME-401-002, GAP-REP-004, and 40+ downstream tasks
|
|
|
|
## Overview
|
|
|
|
This contract defines the canonical `richgraph-v1` schema used for function-level reachability analysis, CAS storage, and DSSE attestation. It specifies the data model, hash algorithms, determinism rules, and CAS layout enabling provable reachability claims.
|
|
|
|
---
|
|
|
|
## Schema Definition
|
|
|
|
### richgraph-v1 Document Structure
|
|
|
|
```json
|
|
{
|
|
"schema": "richgraph-v1",
|
|
"analyzer": {
|
|
"name": "scanner.reachability",
|
|
"version": "0.1.0",
|
|
"toolchain_digest": "sha256:..."
|
|
},
|
|
"nodes": [
|
|
{
|
|
"id": "sym:java:base64url...",
|
|
"symbol_id": "sym:java:base64url...",
|
|
"lang": "java",
|
|
"kind": "method",
|
|
"display": "com.example.Foo.bar(String)",
|
|
"code_id": "code:java:base64url...",
|
|
"code_block_hash": "sha256:deadbeef...",
|
|
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
|
|
"purl": "pkg:maven/com.example/foo@1.0.0",
|
|
"build_id": "gnu-build-id:...",
|
|
"symbol_digest": "sha256:...",
|
|
"evidence": ["import", "disasm"],
|
|
"attributes": {"key": "value"}
|
|
}
|
|
],
|
|
"edges": [
|
|
{
|
|
"from": "sym:java:...",
|
|
"to": "sym:java:...",
|
|
"kind": "call",
|
|
"purl": "pkg:maven/com.example/bar@2.0.0",
|
|
"symbol_digest": "sha256:...",
|
|
"confidence": 0.9,
|
|
"evidence": ["reloc", "runtime"],
|
|
"candidates": []
|
|
}
|
|
],
|
|
"roots": [
|
|
{
|
|
"id": "sym:java:...",
|
|
"phase": "runtime",
|
|
"source": "main"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Node Schema
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `id` | string | Yes | Unique node identifier (typically same as `symbol_id`) |
|
|
| `symbol_id` | string | Yes | Canonical SymbolID (format: `sym:{lang}:{base64url-sha256}`) |
|
|
| `lang` | string | Yes | Language: `java`, `dotnet`, `go`, `node`, `rust`, `python`, `ruby`, `php`, `binary`, `shell` |
|
|
| `kind` | string | Yes | Symbol kind: `method`, `function`, `class`, `module`, `trait`, `struct` |
|
|
| `display` | string | No | Human-readable demangled name |
|
|
| `code_id` | string | No | CodeID for name-less symbols (format: `code:{lang}:{base64url-sha256}`) |
|
|
| `code_block_hash` | string | No | Hash of the code block for stripped/heuristic nodes (algorithm-prefixed hex) |
|
|
| `purl` | string | No | Package URL of containing package |
|
|
| `build_id` | string | No | GNU build-id, PE GUID, or Mach-O UUID |
|
|
| `symbol_digest` | string | No | SHA-256 of the symbol_id (format: `sha256:{hex}`) |
|
|
| `symbol` | object | No | Symbol metadata `{mangled?, demangled?, source?, confidence?}` with `source ∈ {DWARF,PDB,SYM,NONE}` and confidence in [0,1] |
|
|
| `evidence` | string[] | No | Evidence sources (sorted): `import`, `reloc`, `disasm`, `runtime` |
|
|
| `attributes` | object | No | Additional key-value metadata (sorted by key) |
|
|
|
|
### Edge Schema
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `from` | string | Yes | Source node ID |
|
|
| `to` | string | Yes | Target node ID |
|
|
| `kind` | string | Yes | Edge type: `call`, `virtual`, `indirect`, `data`, `init` |
|
|
| `purl` | string | No | Package URL of callee |
|
|
| `symbol_digest` | string | No | SHA-256 of callee symbol_id |
|
|
| `confidence` | number | Yes | Confidence [0.0-1.0]: `certain`=1.0, `high`=0.9, `medium`=0.6, `low`=0.3 |
|
|
| `evidence` | string[] | No | Evidence sources (sorted) |
|
|
| `candidates` | string[] | No | Alternative resolution candidates (sorted) |
|
|
|
|
### Root Schema
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `id` | string | Yes | Node ID designated as entry point |
|
|
| `phase` | string | Yes | Execution phase: `runtime`, `load`, `init`, `test` |
|
|
| `source` | string | No | Entry point source (e.g., `main`, `DT_INIT`, `.ctors`) |
|
|
|
|
---
|
|
|
|
## Hash Algorithms
|
|
|
|
### Summary
|
|
|
|
| Component | Algorithm | Format | Example |
|
|
|-----------|-----------|--------|---------|
|
|
| **graph_hash** | BLAKE3-256 | `blake3:{hex}` | `blake3:a1b2c3d4...` |
|
|
| **symbol_digest** | SHA-256 | `sha256:{hex}` | `sha256:e5f6a7b8...` |
|
|
| **symbol_id fragment** | SHA-256 | base64url-no-pad | `sym:java:abc123...` |
|
|
| **code_id fragment** | SHA-256 | base64url-no-pad | `code:java:xyz789...` |
|
|
|
|
### Graph Hash (BLAKE3-256)
|
|
|
|
The graph hash provides content-addressable identification:
|
|
|
|
```
|
|
graph_hash = "blake3:" + hex(BLAKE3-256(canonical_json_bytes))
|
|
```
|
|
|
|
**Rationale:** BLAKE3 chosen for:
|
|
- Speed (3x+ faster than SHA-256 on modern CPUs)
|
|
- Parallelizable for large graphs
|
|
- Cryptographic security equivalent to SHA-256
|
|
- Consistent with internal content-addressing standard
|
|
|
|
### Symbol Digest (SHA-256)
|
|
|
|
Symbol digests use SHA-256 for interoperability:
|
|
|
|
```
|
|
symbol_digest = "sha256:" + hex(SHA-256(utf8(symbol_id)))
|
|
```
|
|
|
|
### SymbolID and CodeID Fragments
|
|
|
|
Internal fragments use SHA-256 with base64url encoding:
|
|
|
|
```
|
|
fragment = base64url_no_pad(SHA-256(utf8(canonical_tuple)))
|
|
symbol_id = "sym:{lang}:{fragment}"
|
|
code_id = "code:{lang}:{fragment}"
|
|
```
|
|
|
|
---
|
|
|
|
## Determinism Rules
|
|
|
|
All outputs must be reproducible. The `Trimmed()` operation enforces canonical ordering:
|
|
|
|
### Ordering Rules
|
|
|
|
1. **Nodes:** Sort by `id` (ordinal string comparison)
|
|
2. **Edges:** Sort by `(from, to, kind)` in that order (ordinal)
|
|
3. **Roots:** Sort by `id` (ordinal)
|
|
4. **Evidence arrays:** Sort alphabetically (ordinal)
|
|
5. **Candidates arrays:** Sort alphabetically (ordinal)
|
|
6. **Attributes objects:** Sort keys alphabetically (ordinal)
|
|
|
|
### Normalization Rules
|
|
|
|
1. **Trim whitespace:** All string values trimmed
|
|
2. **Empty to null:** Empty strings become null/omitted
|
|
3. **Confidence clamping:** Values clamped to [0.0, 1.0]
|
|
4. **Default values:**
|
|
- `kind` defaults to `"call"` for edges
|
|
- `phase` defaults to `"runtime"` for roots
|
|
- `analyzer.name` defaults to `"scanner.reachability"`
|
|
- `analyzer.version` defaults to `"0.1.0"`
|
|
|
|
### JSON Serialization
|
|
|
|
- No indentation (compact JSON)
|
|
- Keys sorted alphabetically at all levels
|
|
- No trailing whitespace
|
|
- UTF-8 encoding
|
|
- No BOM
|
|
|
|
---
|
|
|
|
## CAS Layout
|
|
|
|
### Graph Storage
|
|
|
|
```
|
|
cas://reachability/graphs/{blake3} # Graph body (canonical JSON)
|
|
cas://reachability/graphs/{blake3}.dsse # DSSE envelope
|
|
```
|
|
|
|
### Edge Bundle Storage (Optional)
|
|
|
|
For runtime hits, init-array roots, and contested edges:
|
|
|
|
```
|
|
cas://reachability/edges/{graph_hash}/{bundle_id} # Edge bundle body
|
|
cas://reachability/edges/{graph_hash}/{bundle_id}.dsse # DSSE envelope
|
|
```
|
|
|
|
### Metadata Storage
|
|
|
|
```
|
|
{output_root}/reachability_graphs/{analysis_id}/richgraph-v1.json # Graph body
|
|
{output_root}/reachability_graphs/{analysis_id}/meta.json # Metadata
|
|
```
|
|
|
|
**meta.json structure:**
|
|
```json
|
|
{
|
|
"schema": "richgraph-v1",
|
|
"graph_hash": "blake3:...",
|
|
"files": [
|
|
{"path": "...", "hash": "blake3:..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## DSSE Integration
|
|
|
|
### Predicate Types
|
|
|
|
| Predicate | Purpose |
|
|
|-----------|---------|
|
|
| `stella.ops/graph@v1` | Graph-level attestation |
|
|
| `stella.ops/edgeBundle@v1` | Edge bundle attestation |
|
|
|
|
### Graph DSSE (Mandatory)
|
|
|
|
Every richgraph-v1 document requires a DSSE envelope:
|
|
|
|
```json
|
|
{
|
|
"payloadType": "application/vnd.stellaops.graph+json",
|
|
"payload": "<base64(canonical_graph_json)>",
|
|
"signatures": [...]
|
|
}
|
|
```
|
|
|
|
**Subject:** `cas://reachability/graphs/{blake3}`
|
|
|
|
### Rekor Integration
|
|
|
|
- **Graph DSSE:** Always publish to Rekor (or mirror when offline)
|
|
- **Edge Bundle DSSE:** Optional, capped at configurable limit per graph
|
|
|
|
---
|
|
|
|
## SymbolID Construction
|
|
|
|
### Format
|
|
|
|
```
|
|
sym:{lang}:{base64url_sha256_no_pad}
|
|
```
|
|
|
|
### Per-Language Canonical Tuples
|
|
|
|
| Language | Tuple Components (NUL-separated) |
|
|
|----------|----------------------------------|
|
|
| Java | `{package}\0{class}\0{method}\0{descriptor}` (lowercased) |
|
|
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` |
|
|
| Go | `{module}\0{package}\0{receiver}\0{func}` |
|
|
| Node/Deno | `{pkg_or_path}\0{export_path}\0{kind}` |
|
|
| Rust | `{crate}\0{module}\0{item}\0{mangled?}` |
|
|
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` |
|
|
| Ruby | `{gem_or_path}\0{module}\0{method}` |
|
|
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` |
|
|
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` |
|
|
| Shell | `{script_rel_path}\0{function_or_cmd}` |
|
|
| Swift | `{module}\0{type}\0{member}\0{mangled?}` |
|
|
|
|
---
|
|
|
|
## CodeID Construction
|
|
|
|
### Format
|
|
|
|
```
|
|
code:{lang}:{base64url_sha256_no_pad}
|
|
```
|
|
|
|
### Use Cases
|
|
|
|
CodeIDs provide stable identifiers when symbol names are unavailable:
|
|
|
|
- **Stripped binaries:** `code:binary:{hash}` from `{format}\0{file_hash}\0{addr}\0{length}\0{section}\0{code_block_hash}`
|
|
- **.NET modules:** `code:dotnet:{hash}` from `{assembly}\0{module}\0{mvid}`
|
|
- **Node packages:** `code:node:{hash}` from `{package}\0{entry_path}`
|
|
|
|
---
|
|
|
|
## Implementation Status
|
|
|
|
### Existing Implementation
|
|
|
|
| Component | Location | Status |
|
|
|-----------|----------|--------|
|
|
| RichGraph model | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraph.cs` | Implemented |
|
|
| SymbolId builder | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/SymbolId.cs` | Implemented |
|
|
| CodeId builder | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/CodeId.cs` | Implemented |
|
|
| RichGraphWriter | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraphWriter.cs` | **Needs BLAKE3** |
|
|
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
|
|
|
|
### Required Changes
|
|
|
|
| Change | Priority | Notes |
|
|
|--------|----------|-------|
|
|
| Update RichGraphWriter to use BLAKE3 | P0 | Currently uses SHA256 for graph_hash |
|
|
| Add `meta.json` hash prefix | P1 | Use `blake3:` prefix |
|
|
| CAS adapter for graph storage | P1 | Implement `cas://reachability/graphs/{blake3}` paths |
|
|
|
|
---
|
|
|
|
## Decision Checklist
|
|
|
|
This contract resolves the following decisions from the 2025-12-02 alignment meeting:
|
|
|
|
| Decision | Choice | Rationale |
|
|
|----------|--------|-----------|
|
|
| Graph hash algorithm | BLAKE3-256 | Speed + security |
|
|
| Symbol digest algorithm | SHA-256 | Interoperability |
|
|
| CAS path scheme | `cas://reachability/graphs/{blake3}` | Content-addressable |
|
|
| DSSE required for graphs | Yes (mandatory) | Provenance chain |
|
|
| DSSE for edge bundles | Optional (capped) | Rekor volume control |
|
|
| JSON canonicalization | Sorted keys, compact | Determinism |
|
|
| Hash prefix format | `{alg}:{hex}` | Explicit algorithm ID |
|
|
|
|
---
|
|
|
|
## Validation Rules
|
|
|
|
### Schema Validation
|
|
|
|
1. `schema` must equal `"richgraph-v1"`
|
|
2. `nodes` array must not be empty
|
|
3. All node `id` values must be unique
|
|
4. All edge `from`/`to` must reference existing nodes
|
|
5. All root `id` values must reference existing nodes
|
|
6. `confidence` must be in range [0.0, 1.0]
|
|
|
|
### Hash Validation
|
|
|
|
1. `graph_hash` must match BLAKE3-256 of canonical JSON
|
|
2. `symbol_digest` must match SHA-256 of `symbol_id`
|
|
3. SymbolID fragments must match SHA-256 of canonical tuple
|
|
|
|
---
|
|
|
|
## Migration Path
|
|
|
|
### From Current Implementation
|
|
|
|
1. **RichGraphWriter:** Replace `ComputeSha256` with `ComputeBlake3` for graph hash
|
|
2. **meta.json:** Update hash format from `sha256:` to `blake3:`
|
|
3. **Existing graphs:** Recompute hashes on next scan (no migration needed)
|
|
|
|
### Compatibility
|
|
|
|
- Symbol digests remain SHA-256 (no change)
|
|
- SymbolID format unchanged
|
|
- CodeID format unchanged
|
|
|
|
---
|
|
|
|
## Reference Implementation
|
|
|
|
### Canonical JSON Writer
|
|
|
|
```csharp
|
|
// From RichGraph.cs - Trimmed() enforces canonical ordering
|
|
public RichGraph Trimmed()
|
|
{
|
|
var nodes = Nodes.OrderBy(n => n.Id, StringComparer.Ordinal).ToList();
|
|
var edges = Edges
|
|
.OrderBy(e => e.From, StringComparer.Ordinal)
|
|
.ThenBy(e => e.To, StringComparer.Ordinal)
|
|
.ThenBy(e => e.Kind, StringComparer.Ordinal)
|
|
.ToList();
|
|
var roots = Roots.OrderBy(r => r.Id, StringComparer.Ordinal).ToList();
|
|
return this with { Nodes = nodes, Edges = edges, Roots = roots };
|
|
}
|
|
```
|
|
|
|
### BLAKE3 Graph Hash (Required Update)
|
|
|
|
```csharp
|
|
// Replace in RichGraphWriter.cs
|
|
private static string ComputeBlake3(byte[] bytes)
|
|
{
|
|
using var blake3 = Blake3.Hasher.New();
|
|
blake3.Update(bytes);
|
|
var hash = blake3.Finalize();
|
|
return "blake3:" + Convert.ToHexString(hash.AsSpan()).ToLowerInvariant();
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Related Contracts
|
|
|
|
- [Sealed Mode](./sealed-mode.md) - Air-gap operation with CAS
|
|
- [Mirror Bundle](./mirror-bundle.md) - Offline transport format
|
|
- [Verification Policy](./verification-policy.md) - DSSE verification rules
|
|
- [Scanner Surface](./scanner-surface.md) - Surface analysis framework
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Version | Date | Author | Changes |
|
|
|---------|------|--------|---------|
|
|
| 1.0.0 | 2025-12-05 | Scanner Guild | Initial contract from alignment meeting |
|