# Runtime + Static Reachability Union Schema (v0.1, 2025-11-23) ## Goals - Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages. - Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable. - Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes. ## File layout (CAS) - Namespace root: `reachability_graphs//` (analysis_id is caller-supplied UUID or hash). - Files (all NDJSON, UTF-8, newline terminated, sorted as noted): - `nodes.ndjson` (sorted by `symbol_id`) - `edges.ndjson` (sorted by `from` then `to` then `edge_type`) - `facts_runtime.ndjson` (sorted by `symbol_id`, optional) - `meta.json` (single JSON object; schema version, produced_by, timestamps, tool versions, hashes) - Hashing: SHA-256 of each file recorded in `meta.json` under `files[]` with `path`, `sha256`, `records`. - Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first. ## SymbolID (language-agnostic envelope) ``` symbol_id = "sym:" + + ":" + ``` - `lang`: `java|dotnet|go|node|deno|rust|swift|shell|binary` - `stable-fragment`: SHA-256(base64url-no-pad) of the canonical tuple per language: - **java**: (`package`, `class`, `method`, `descriptor`) lowercased, descriptor in JVM format. - **dotnet**: (`assembly_name`, `namespace`, `type`, `member_signature`) using ECMA-335 signature string. - **node/deno**: (`pkg_name_or_path`, `export_path`, `kind`) where `export_path` is slash-joined ESM/CJS path; `pkg_name_or_path` uses npm name or normalized absolute path with drive stripped. - **go**: (`module_path`, `package_path`, `receiver`, `func`), with receiver empty for functions. - **rust**: (`crate`, `module_path`, `item_name`, `mangled`) - **swift**: (`module`, `type`, `member`, `swift-mangled`) - **shell**: (`script_relpath`, `function_or_cmd`) - **binary**: (`binary_build_id`, `section`, `symbol_name`) ## nodes.ndjson Each line: ``` { "symbol_id": "sym:lang:...", "lang": "dotnet", "kind": "function|method|type|module|package|binary", "display": "Human readable name", "source": { "file": "relative/or/pkg/path", "line": 123, "col": 1, "digest": "sha256:" }, "attributes": { "visibility": "public|internal|private", "async": true, "static": false, "generic_arity": 2 } } ``` Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside `attributes` (e.g., `jvm_descriptor`, `dotnet_signature`). ## edges.ndjson Each line (static or runtime-derived; see `source`): ``` { "from": "sym:...", "to": "sym:...", "edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn", "confidence": "certain|high|medium|low", "source": { "origin": "static|runtime", "provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook", "evidence": "file:path:line" } } ``` - Ordering: primary `from`, secondary `to`, tertiary `edge_type`. - Duplicate edges with different provenance are allowed; consumers deduplicate by (`from`,`to`,`edge_type`,`provenance`). ## facts_runtime.ndjson (optional) Runtime-only observations attached to symbols: ``` { "symbol_id": "sym:...", "samples": { "call_count": 14, "first_seen_utc": "2025-11-22T18:21:12Z", "last_seen_utc": "2025-11-22T18:23:01Z" }, "env": { "pid": 1234, "image": "sha256:...", "entrypoint": "main", "tags": ["sealed","offline"] } } ``` Sorting by `symbol_id`. Time fields must be UTC ISO-8601 with `Z`. ## meta.json ``` { "schema": "reachability-union@0.1", "generated_at": "2025-11-23T00:00:00Z", "produced_by": { "tool": "StellaOps.Scanner.Worker", "version": "0.1.0", "analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"] }, "files": [ {"path":"nodes.ndjson","sha256":"...","records":1234}, {"path":"edges.ndjson","sha256":"...","records":4567}, {"path":"facts_runtime.ndjson","sha256":"...","records":89} ], "options": { "dedupe_edges": false, "include_runtime": true } } ``` ## Determinism rules - Sort order as noted; no nulls; omit empty objects/arrays. - All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above. - Hash inputs use exact serialized bytes (no trailing spaces, newline `\n` only). ## Validation - JSON Schema draft 2020-12 available at `docs/reachability/runtime-static-union-schema.json` (to be generated from this spec; allowable values match enumerations above). - Minimal required fields: `symbol_id`, `lang`, `kind` (nodes); `from`, `to`, `edge_type`, `source.origin` (edges). ## Integration guidance - Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution). - CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy). - Consumers should treat runtime edges as additive; when both origins exist, prefer `origin=runtime` for exploitability scoring but keep static edges for coverage.