5.1 KiB
5.1 KiB
Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)
Goals
- Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
- Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
- Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.
File layout (CAS)
- Namespace root:
reachability_graphs/<analysis_id>/(analysis_id is caller-supplied UUID or hash). - Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
nodes.ndjson(sorted bysymbol_id)edges.ndjson(sorted byfromthentothenedge_type)facts_runtime.ndjson(sorted bysymbol_id, optional)meta.json(single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
- Hashing: SHA-256 of each file recorded in
meta.jsonunderfiles[]withpath,sha256,records. - Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.
SymbolID (language-agnostic envelope)
symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
lang:java|dotnet|go|node|deno|rust|swift|shell|binarystable-fragment: SHA-256(base64url-no-pad) of the canonical tuple per language:- java: (
package,class,method,descriptor) lowercased, descriptor in JVM format. - dotnet: (
assembly_name,namespace,type,member_signature) using ECMA-335 signature string. - node/deno: (
pkg_name_or_path,export_path,kind) whereexport_pathis slash-joined ESM/CJS path;pkg_name_or_pathuses npm name or normalized absolute path with drive stripped. - go: (
module_path,package_path,receiver,func), with receiver empty for functions. - rust: (
crate,module_path,item_name,mangled) - swift: (
module,type,member,swift-mangled) - shell: (
script_relpath,function_or_cmd) - binary: (
binary_build_id,section,symbol_name)
- java: (
nodes.ndjson
Each line:
{
"symbol_id": "sym:lang:...",
"lang": "dotnet",
"kind": "function|method|type|module|package|binary",
"display": "Human readable name",
"source": {
"file": "relative/or/pkg/path",
"line": 123,
"col": 1,
"digest": "sha256:<hex>"
},
"attributes": {
"visibility": "public|internal|private",
"async": true,
"static": false,
"generic_arity": 2
}
}
Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside attributes (e.g., jvm_descriptor, dotnet_signature).
edges.ndjson
Each line (static or runtime-derived; see source):
{
"from": "sym:...",
"to": "sym:...",
"edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
"confidence": "certain|high|medium|low",
"source": {
"origin": "static|runtime",
"provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
"evidence": "file:path:line"
}
}
- Ordering: primary
from, secondaryto, tertiaryedge_type. - Duplicate edges with different provenance are allowed; consumers deduplicate by (
from,to,edge_type,provenance).
facts_runtime.ndjson (optional)
Runtime-only observations attached to symbols:
{
"symbol_id": "sym:...",
"samples": {
"call_count": 14,
"first_seen_utc": "2025-11-22T18:21:12Z",
"last_seen_utc": "2025-11-22T18:23:01Z"
},
"env": {
"pid": 1234,
"image": "sha256:...",
"entrypoint": "main",
"tags": ["sealed","offline"]
}
}
Sorting by symbol_id. Time fields must be UTC ISO-8601 with Z.
meta.json
{
"schema": "reachability-union@0.1",
"generated_at": "2025-11-23T00:00:00Z",
"produced_by": {
"tool": "StellaOps.Scanner.Worker",
"version": "0.1.0",
"analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
},
"files": [
{"path":"nodes.ndjson","sha256":"...","records":1234},
{"path":"edges.ndjson","sha256":"...","records":4567},
{"path":"facts_runtime.ndjson","sha256":"...","records":89}
],
"options": {
"dedupe_edges": false,
"include_runtime": true
}
}
Determinism rules
- Sort order as noted; no nulls; omit empty objects/arrays.
- All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
- Hash inputs use exact serialized bytes (no trailing spaces, newline
\nonly).
Validation
- JSON Schema draft 2020-12 available at
docs/reachability/runtime-static-union-schema.json(to be generated from this spec; allowable values match enumerations above). - Minimal required fields:
symbol_id,lang,kind(nodes);from,to,edge_type,source.origin(edges).
Integration guidance
- Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
- CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
- Consumers should treat runtime edges as additive; when both origins exist, prefer
origin=runtimefor exploitability scoring but keep static edges for coverage.