Files
git.stella-ops.org/docs/contracts/richgraph-v1.md
master cc69d332e3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests for RabbitMq and Udp transport servers and clients
- Implemented comprehensive unit tests for RabbitMqTransportServer, covering constructor, disposal, connection management, event handlers, and exception handling.
- Added configuration tests for RabbitMqTransportServer to validate SSL, durable queues, auto-recovery, and custom virtual host options.
- Created unit tests for UdpFrameProtocol, including frame parsing and serialization, header size validation, and round-trip data preservation.
- Developed tests for UdpTransportClient, focusing on connection handling, event subscriptions, and exception scenarios.
- Established tests for UdpTransportServer, ensuring proper start/stop behavior, connection state management, and event handling.
- Included tests for UdpTransportOptions to verify default values and modification capabilities.
- Enhanced service registration tests for Udp transport services in the dependency injection container.
2025-12-05 19:01:12 +02:00

415 lines
12 KiB
Markdown

# CONTRACT-RICHGRAPH-V1-015: Reachability Graph Schema
> **Status:** Published
> **Version:** 1.0.0
> **Published:** 2025-12-05
> **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild
> **Unblocks:** GRAPH-CAS-401-001, GAP-SYM-007, SCAN-REACH-401-009, SCANNER-NATIVE-401-015, SYMS-SERVER-401-011, SYMS-CLIENT-401-012, SYMS-INGEST-401-013, SIGNALS-RUNTIME-401-002, GAP-REP-004, and 40+ downstream tasks
## Overview
This contract defines the canonical `richgraph-v1` schema used for function-level reachability analysis, CAS storage, and DSSE attestation. It specifies the data model, hash algorithms, determinism rules, and CAS layout enabling provable reachability claims.
---
## Schema Definition
### richgraph-v1 Document Structure
```json
{
"schema": "richgraph-v1",
"analyzer": {
"name": "scanner.reachability",
"version": "0.1.0",
"toolchain_digest": "sha256:..."
},
"nodes": [
{
"id": "sym:java:base64url...",
"symbol_id": "sym:java:base64url...",
"lang": "java",
"kind": "method",
"display": "com.example.Foo.bar(String)",
"code_id": "code:java:base64url...",
"purl": "pkg:maven/com.example/foo@1.0.0",
"build_id": "gnu-build-id:...",
"symbol_digest": "sha256:...",
"evidence": ["import", "disasm"],
"attributes": {"key": "value"}
}
],
"edges": [
{
"from": "sym:java:...",
"to": "sym:java:...",
"kind": "call",
"purl": "pkg:maven/com.example/bar@2.0.0",
"symbol_digest": "sha256:...",
"confidence": 0.9,
"evidence": ["reloc", "runtime"],
"candidates": []
}
],
"roots": [
{
"id": "sym:java:...",
"phase": "runtime",
"source": "main"
}
]
}
```
### Node Schema
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `id` | string | Yes | Unique node identifier (typically same as `symbol_id`) |
| `symbol_id` | string | Yes | Canonical SymbolID (format: `sym:{lang}:{base64url-sha256}`) |
| `lang` | string | Yes | Language: `java`, `dotnet`, `go`, `node`, `rust`, `python`, `ruby`, `php`, `binary`, `shell` |
| `kind` | string | Yes | Symbol kind: `method`, `function`, `class`, `module`, `trait`, `struct` |
| `display` | string | No | Human-readable demangled name |
| `code_id` | string | No | CodeID for name-less symbols (format: `code:{lang}:{base64url-sha256}`) |
| `purl` | string | No | Package URL of containing package |
| `build_id` | string | No | GNU build-id, PE GUID, or Mach-O UUID |
| `symbol_digest` | string | No | SHA-256 of the symbol_id (format: `sha256:{hex}`) |
| `evidence` | string[] | No | Evidence sources (sorted): `import`, `reloc`, `disasm`, `runtime` |
| `attributes` | object | No | Additional key-value metadata (sorted by key) |
### Edge Schema
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `from` | string | Yes | Source node ID |
| `to` | string | Yes | Target node ID |
| `kind` | string | Yes | Edge type: `call`, `virtual`, `indirect`, `data`, `init` |
| `purl` | string | No | Package URL of callee |
| `symbol_digest` | string | No | SHA-256 of callee symbol_id |
| `confidence` | number | Yes | Confidence [0.0-1.0]: `certain`=1.0, `high`=0.9, `medium`=0.6, `low`=0.3 |
| `evidence` | string[] | No | Evidence sources (sorted) |
| `candidates` | string[] | No | Alternative resolution candidates (sorted) |
### Root Schema
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `id` | string | Yes | Node ID designated as entry point |
| `phase` | string | Yes | Execution phase: `runtime`, `load`, `init`, `test` |
| `source` | string | No | Entry point source (e.g., `main`, `DT_INIT`, `.ctors`) |
---
## Hash Algorithms
### Summary
| Component | Algorithm | Format | Example |
|-----------|-----------|--------|---------|
| **graph_hash** | BLAKE3-256 | `blake3:{hex}` | `blake3:a1b2c3d4...` |
| **symbol_digest** | SHA-256 | `sha256:{hex}` | `sha256:e5f6a7b8...` |
| **symbol_id fragment** | SHA-256 | base64url-no-pad | `sym:java:abc123...` |
| **code_id fragment** | SHA-256 | base64url-no-pad | `code:java:xyz789...` |
### Graph Hash (BLAKE3-256)
The graph hash provides content-addressable identification:
```
graph_hash = "blake3:" + hex(BLAKE3-256(canonical_json_bytes))
```
**Rationale:** BLAKE3 chosen for:
- Speed (3x+ faster than SHA-256 on modern CPUs)
- Parallelizable for large graphs
- Cryptographic security equivalent to SHA-256
- Consistent with internal content-addressing standard
### Symbol Digest (SHA-256)
Symbol digests use SHA-256 for interoperability:
```
symbol_digest = "sha256:" + hex(SHA-256(utf8(symbol_id)))
```
### SymbolID and CodeID Fragments
Internal fragments use SHA-256 with base64url encoding:
```
fragment = base64url_no_pad(SHA-256(utf8(canonical_tuple)))
symbol_id = "sym:{lang}:{fragment}"
code_id = "code:{lang}:{fragment}"
```
---
## Determinism Rules
All outputs must be reproducible. The `Trimmed()` operation enforces canonical ordering:
### Ordering Rules
1. **Nodes:** Sort by `id` (ordinal string comparison)
2. **Edges:** Sort by `(from, to, kind)` in that order (ordinal)
3. **Roots:** Sort by `id` (ordinal)
4. **Evidence arrays:** Sort alphabetically (ordinal)
5. **Candidates arrays:** Sort alphabetically (ordinal)
6. **Attributes objects:** Sort keys alphabetically (ordinal)
### Normalization Rules
1. **Trim whitespace:** All string values trimmed
2. **Empty to null:** Empty strings become null/omitted
3. **Confidence clamping:** Values clamped to [0.0, 1.0]
4. **Default values:**
- `kind` defaults to `"call"` for edges
- `phase` defaults to `"runtime"` for roots
- `analyzer.name` defaults to `"scanner.reachability"`
- `analyzer.version` defaults to `"0.1.0"`
### JSON Serialization
- No indentation (compact JSON)
- Keys sorted alphabetically at all levels
- No trailing whitespace
- UTF-8 encoding
- No BOM
---
## CAS Layout
### Graph Storage
```
cas://reachability/graphs/{blake3} # Graph body (canonical JSON)
cas://reachability/graphs/{blake3}.dsse # DSSE envelope
```
### Edge Bundle Storage (Optional)
For runtime hits, init-array roots, and contested edges:
```
cas://reachability/edges/{graph_hash}/{bundle_id} # Edge bundle body
cas://reachability/edges/{graph_hash}/{bundle_id}.dsse # DSSE envelope
```
### Metadata Storage
```
{output_root}/reachability_graphs/{analysis_id}/richgraph-v1.json # Graph body
{output_root}/reachability_graphs/{analysis_id}/meta.json # Metadata
```
**meta.json structure:**
```json
{
"schema": "richgraph-v1",
"graph_hash": "blake3:...",
"files": [
{"path": "...", "hash": "blake3:..."}
]
}
```
---
## DSSE Integration
### Predicate Types
| Predicate | Purpose |
|-----------|---------|
| `stella.ops/graph@v1` | Graph-level attestation |
| `stella.ops/edgeBundle@v1` | Edge bundle attestation |
### Graph DSSE (Mandatory)
Every richgraph-v1 document requires a DSSE envelope:
```json
{
"payloadType": "application/vnd.stellaops.graph+json",
"payload": "<base64(canonical_graph_json)>",
"signatures": [...]
}
```
**Subject:** `cas://reachability/graphs/{blake3}`
### Rekor Integration
- **Graph DSSE:** Always publish to Rekor (or mirror when offline)
- **Edge Bundle DSSE:** Optional, capped at configurable limit per graph
---
## SymbolID Construction
### Format
```
sym:{lang}:{base64url_sha256_no_pad}
```
### Per-Language Canonical Tuples
| Language | Tuple Components (NUL-separated) |
|----------|----------------------------------|
| Java | `{package}\0{class}\0{method}\0{descriptor}` (lowercased) |
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` |
| Go | `{module}\0{package}\0{receiver}\0{func}` |
| Node/Deno | `{pkg_or_path}\0{export_path}\0{kind}` |
| Rust | `{crate}\0{module}\0{item}\0{mangled?}` |
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` |
| Ruby | `{gem_or_path}\0{module}\0{method}` |
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` |
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` |
| Shell | `{script_rel_path}\0{function_or_cmd}` |
| Swift | `{module}\0{type}\0{member}\0{mangled?}` |
---
## CodeID Construction
### Format
```
code:{lang}:{base64url_sha256_no_pad}
```
### Use Cases
CodeIDs provide stable identifiers when symbol names are unavailable:
- **Stripped binaries:** `code:binary:{hash}` from `{format}\0{file_hash}\0{addr}\0{length}\0{section}\0{code_block_hash}`
- **.NET modules:** `code:dotnet:{hash}` from `{assembly}\0{module}\0{mvid}`
- **Node packages:** `code:node:{hash}` from `{package}\0{entry_path}`
---
## Implementation Status
### Existing Implementation
| Component | Location | Status |
|-----------|----------|--------|
| RichGraph model | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraph.cs` | Implemented |
| SymbolId builder | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/SymbolId.cs` | Implemented |
| CodeId builder | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/CodeId.cs` | Implemented |
| RichGraphWriter | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability/RichGraphWriter.cs` | **Needs BLAKE3** |
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
### Required Changes
| Change | Priority | Notes |
|--------|----------|-------|
| Update RichGraphWriter to use BLAKE3 | P0 | Currently uses SHA256 for graph_hash |
| Add `meta.json` hash prefix | P1 | Use `blake3:` prefix |
| CAS adapter for graph storage | P1 | Implement `cas://reachability/graphs/{blake3}` paths |
---
## Decision Checklist
This contract resolves the following decisions from the 2025-12-02 alignment meeting:
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Graph hash algorithm | BLAKE3-256 | Speed + security |
| Symbol digest algorithm | SHA-256 | Interoperability |
| CAS path scheme | `cas://reachability/graphs/{blake3}` | Content-addressable |
| DSSE required for graphs | Yes (mandatory) | Provenance chain |
| DSSE for edge bundles | Optional (capped) | Rekor volume control |
| JSON canonicalization | Sorted keys, compact | Determinism |
| Hash prefix format | `{alg}:{hex}` | Explicit algorithm ID |
---
## Validation Rules
### Schema Validation
1. `schema` must equal `"richgraph-v1"`
2. `nodes` array must not be empty
3. All node `id` values must be unique
4. All edge `from`/`to` must reference existing nodes
5. All root `id` values must reference existing nodes
6. `confidence` must be in range [0.0, 1.0]
### Hash Validation
1. `graph_hash` must match BLAKE3-256 of canonical JSON
2. `symbol_digest` must match SHA-256 of `symbol_id`
3. SymbolID fragments must match SHA-256 of canonical tuple
---
## Migration Path
### From Current Implementation
1. **RichGraphWriter:** Replace `ComputeSha256` with `ComputeBlake3` for graph hash
2. **meta.json:** Update hash format from `sha256:` to `blake3:`
3. **Existing graphs:** Recompute hashes on next scan (no migration needed)
### Compatibility
- Symbol digests remain SHA-256 (no change)
- SymbolID format unchanged
- CodeID format unchanged
---
## Reference Implementation
### Canonical JSON Writer
```csharp
// From RichGraph.cs - Trimmed() enforces canonical ordering
public RichGraph Trimmed()
{
var nodes = Nodes.OrderBy(n => n.Id, StringComparer.Ordinal).ToList();
var edges = Edges
.OrderBy(e => e.From, StringComparer.Ordinal)
.ThenBy(e => e.To, StringComparer.Ordinal)
.ThenBy(e => e.Kind, StringComparer.Ordinal)
.ToList();
var roots = Roots.OrderBy(r => r.Id, StringComparer.Ordinal).ToList();
return this with { Nodes = nodes, Edges = edges, Roots = roots };
}
```
### BLAKE3 Graph Hash (Required Update)
```csharp
// Replace in RichGraphWriter.cs
private static string ComputeBlake3(byte[] bytes)
{
using var blake3 = Blake3.Hasher.New();
blake3.Update(bytes);
var hash = blake3.Finalize();
return "blake3:" + Convert.ToHexString(hash.AsSpan()).ToLowerInvariant();
}
```
---
## Related Contracts
- [Sealed Mode](./sealed-mode.md) - Air-gap operation with CAS
- [Mirror Bundle](./mirror-bundle.md) - Offline transport format
- [Verification Policy](./verification-policy.md) - DSSE verification rules
- [Scanner Surface](./scanner-surface.md) - Surface analysis framework
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-05 | Scanner Guild | Initial contract from alignment meeting |