130 lines
5.1 KiB
Markdown
130 lines
5.1 KiB
Markdown
# Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)
|
|
|
|
## Goals
|
|
- Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
|
|
- Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
|
|
- Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.
|
|
|
|
## File layout (CAS)
|
|
- Namespace root: `reachability_graphs/<analysis_id>/` (analysis_id is caller-supplied UUID or hash).
|
|
- Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
|
|
- `nodes.ndjson` (sorted by `symbol_id`)
|
|
- `edges.ndjson` (sorted by `from` then `to` then `edge_type`)
|
|
- `facts_runtime.ndjson` (sorted by `symbol_id`, optional)
|
|
- `meta.json` (single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
|
|
- Hashing: SHA-256 of each file recorded in `meta.json` under `files[]` with `path`, `sha256`, `records`.
|
|
- Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.
|
|
|
|
## SymbolID (language-agnostic envelope)
|
|
```
|
|
symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
|
|
```
|
|
- `lang`: `java|dotnet|go|node|deno|rust|swift|shell|binary`
|
|
- `stable-fragment`: SHA-256(base64url-no-pad) of the canonical tuple per language:
|
|
- **java**: (`package`, `class`, `method`, `descriptor`) lowercased, descriptor in JVM format.
|
|
- **dotnet**: (`assembly_name`, `namespace`, `type`, `member_signature`) using ECMA-335 signature string.
|
|
- **node/deno**: (`pkg_name_or_path`, `export_path`, `kind`) where `export_path` is slash-joined ESM/CJS path; `pkg_name_or_path` uses npm name or normalized absolute path with drive stripped.
|
|
- **go**: (`module_path`, `package_path`, `receiver`, `func`), with receiver empty for functions.
|
|
- **rust**: (`crate`, `module_path`, `item_name`, `mangled`)
|
|
- **swift**: (`module`, `type`, `member`, `swift-mangled`)
|
|
- **shell**: (`script_relpath`, `function_or_cmd`)
|
|
- **binary**: (`binary_build_id`, `section`, `symbol_name`)
|
|
|
|
## nodes.ndjson
|
|
Each line:
|
|
```
|
|
{
|
|
"symbol_id": "sym:lang:...",
|
|
"lang": "dotnet",
|
|
"kind": "function|method|type|module|package|binary",
|
|
"display": "Human readable name",
|
|
"source": {
|
|
"file": "relative/or/pkg/path",
|
|
"line": 123,
|
|
"col": 1,
|
|
"digest": "sha256:<hex>"
|
|
},
|
|
"attributes": {
|
|
"visibility": "public|internal|private",
|
|
"async": true,
|
|
"static": false,
|
|
"generic_arity": 2
|
|
}
|
|
}
|
|
```
|
|
Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside `attributes` (e.g., `jvm_descriptor`, `dotnet_signature`).
|
|
|
|
## edges.ndjson
|
|
Each line (static or runtime-derived; see `source`):
|
|
```
|
|
{
|
|
"from": "sym:...",
|
|
"to": "sym:...",
|
|
"edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
|
|
"confidence": "certain|high|medium|low",
|
|
"source": {
|
|
"origin": "static|runtime",
|
|
"provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
|
|
"evidence": "file:path:line"
|
|
}
|
|
}
|
|
```
|
|
- Ordering: primary `from`, secondary `to`, tertiary `edge_type`.
|
|
- Duplicate edges with different provenance are allowed; consumers deduplicate by (`from`,`to`,`edge_type`,`provenance`).
|
|
|
|
## facts_runtime.ndjson (optional)
|
|
Runtime-only observations attached to symbols:
|
|
```
|
|
{
|
|
"symbol_id": "sym:...",
|
|
"samples": {
|
|
"call_count": 14,
|
|
"first_seen_utc": "2025-11-22T18:21:12Z",
|
|
"last_seen_utc": "2025-11-22T18:23:01Z"
|
|
},
|
|
"env": {
|
|
"pid": 1234,
|
|
"image": "sha256:...",
|
|
"entrypoint": "main",
|
|
"tags": ["sealed","offline"]
|
|
}
|
|
}
|
|
```
|
|
Sorting by `symbol_id`. Time fields must be UTC ISO-8601 with `Z`.
|
|
|
|
## meta.json
|
|
```
|
|
{
|
|
"schema": "reachability-union@0.1",
|
|
"generated_at": "2025-11-23T00:00:00Z",
|
|
"produced_by": {
|
|
"tool": "StellaOps.Scanner.Worker",
|
|
"version": "0.1.0",
|
|
"analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
|
|
},
|
|
"files": [
|
|
{"path":"nodes.ndjson","sha256":"...","records":1234},
|
|
{"path":"edges.ndjson","sha256":"...","records":4567},
|
|
{"path":"facts_runtime.ndjson","sha256":"...","records":89}
|
|
],
|
|
"options": {
|
|
"dedupe_edges": false,
|
|
"include_runtime": true
|
|
}
|
|
}
|
|
```
|
|
|
|
## Determinism rules
|
|
- Sort order as noted; no nulls; omit empty objects/arrays.
|
|
- All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
|
|
- Hash inputs use exact serialized bytes (no trailing spaces, newline `\n` only).
|
|
|
|
## Validation
|
|
- JSON Schema draft 2020-12 available at `docs/reachability/runtime-static-union-schema.json` (to be generated from this spec; allowable values match enumerations above).
|
|
- Minimal required fields: `symbol_id`, `lang`, `kind` (nodes); `from`, `to`, `edge_type`, `source.origin` (edges).
|
|
|
|
## Integration guidance
|
|
- Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
|
|
- CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
|
|
- Consumers should treat runtime edges as additive; when both origins exist, prefer `origin=runtime` for exploitability scoring but keep static edges for coverage.
|