Files
git.stella-ops.org/docs/reachability/runtime-static-union-schema.md
2025-11-24 00:34:20 +02:00

5.1 KiB

Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)

Goals

  • Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
  • Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
  • Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.

File layout (CAS)

  • Namespace root: reachability_graphs/<analysis_id>/ (analysis_id is caller-supplied UUID or hash).
  • Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
    • nodes.ndjson (sorted by symbol_id)
    • edges.ndjson (sorted by from then to then edge_type)
    • facts_runtime.ndjson (sorted by symbol_id, optional)
    • meta.json (single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
  • Hashing: SHA-256 of each file recorded in meta.json under files[] with path, sha256, records.
  • Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.

SymbolID (language-agnostic envelope)

symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
  • lang: java|dotnet|go|node|deno|rust|swift|shell|binary
  • stable-fragment: SHA-256(base64url-no-pad) of the canonical tuple per language:
    • java: (package, class, method, descriptor) lowercased, descriptor in JVM format.
    • dotnet: (assembly_name, namespace, type, member_signature) using ECMA-335 signature string.
    • node/deno: (pkg_name_or_path, export_path, kind) where export_path is slash-joined ESM/CJS path; pkg_name_or_path uses npm name or normalized absolute path with drive stripped.
    • go: (module_path, package_path, receiver, func), with receiver empty for functions.
    • rust: (crate, module_path, item_name, mangled)
    • swift: (module, type, member, swift-mangled)
    • shell: (script_relpath, function_or_cmd)
    • binary: (binary_build_id, section, symbol_name)

nodes.ndjson

Each line:

{
  "symbol_id": "sym:lang:...",
  "lang": "dotnet",
  "kind": "function|method|type|module|package|binary",
  "display": "Human readable name",
  "source": {
    "file": "relative/or/pkg/path",
    "line": 123,
    "col": 1,
    "digest": "sha256:<hex>"
  },
  "attributes": {
    "visibility": "public|internal|private",
    "async": true,
    "static": false,
    "generic_arity": 2
  }
}

Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside attributes (e.g., jvm_descriptor, dotnet_signature).

edges.ndjson

Each line (static or runtime-derived; see source):

{
  "from": "sym:...",
  "to": "sym:...",
  "edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
  "confidence": "certain|high|medium|low",
  "source": {
    "origin": "static|runtime",
    "provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
    "evidence": "file:path:line"  
  }
}
  • Ordering: primary from, secondary to, tertiary edge_type.
  • Duplicate edges with different provenance are allowed; consumers deduplicate by (from,to,edge_type,provenance).

facts_runtime.ndjson (optional)

Runtime-only observations attached to symbols:

{
  "symbol_id": "sym:...",
  "samples": {
    "call_count": 14,
    "first_seen_utc": "2025-11-22T18:21:12Z",
    "last_seen_utc": "2025-11-22T18:23:01Z"
  },
  "env": {
    "pid": 1234,
    "image": "sha256:...",
    "entrypoint": "main",
    "tags": ["sealed","offline"]
  }
}

Sorting by symbol_id. Time fields must be UTC ISO-8601 with Z.

meta.json

{
  "schema": "reachability-union@0.1",
  "generated_at": "2025-11-23T00:00:00Z",
  "produced_by": {
    "tool": "StellaOps.Scanner.Worker",
    "version": "0.1.0",
    "analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
  },
  "files": [
    {"path":"nodes.ndjson","sha256":"...","records":1234},
    {"path":"edges.ndjson","sha256":"...","records":4567},
    {"path":"facts_runtime.ndjson","sha256":"...","records":89}
  ],
  "options": {
    "dedupe_edges": false,
    "include_runtime": true
  }
}

Determinism rules

  • Sort order as noted; no nulls; omit empty objects/arrays.
  • All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
  • Hash inputs use exact serialized bytes (no trailing spaces, newline \n only).

Validation

  • JSON Schema draft 2020-12 available at docs/reachability/runtime-static-union-schema.json (to be generated from this spec; allowable values match enumerations above).
  • Minimal required fields: symbol_id, lang, kind (nodes); from, to, edge_type, source.origin (edges).

Integration guidance

  • Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
  • CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
  • Consumers should treat runtime edges as additive; when both origins exist, prefer origin=runtime for exploitability scoring but keep static edges for coverage.