work work ... haaaard work
This commit is contained in:
129
docs/reachability/runtime-static-union-schema.md
Normal file
129
docs/reachability/runtime-static-union-schema.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)
|
||||
|
||||
## Goals
|
||||
- Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
|
||||
- Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
|
||||
- Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.
|
||||
|
||||
## File layout (CAS)
|
||||
- Namespace root: `reachability_graphs/<analysis_id>/` (analysis_id is caller-supplied UUID or hash).
|
||||
- Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
|
||||
- `nodes.ndjson` (sorted by `symbol_id`)
|
||||
- `edges.ndjson` (sorted by `from` then `to` then `edge_type`)
|
||||
- `facts_runtime.ndjson` (sorted by `symbol_id`, optional)
|
||||
- `meta.json` (single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
|
||||
- Hashing: SHA-256 of each file recorded in `meta.json` under `files[]` with `path`, `sha256`, `records`.
|
||||
- Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.
|
||||
|
||||
## SymbolID (language-agnostic envelope)
|
||||
```
|
||||
symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
|
||||
```
|
||||
- `lang`: `java|dotnet|go|node|deno|rust|swift|shell|binary`
|
||||
- `stable-fragment`: SHA-256(base64url-no-pad) of the canonical tuple per language:
|
||||
- **java**: (`package`, `class`, `method`, `descriptor`) lowercased, descriptor in JVM format.
|
||||
- **dotnet**: (`assembly_name`, `namespace`, `type`, `member_signature`) using ECMA-335 signature string.
|
||||
- **node/deno**: (`pkg_name_or_path`, `export_path`, `kind`) where `export_path` is slash-joined ESM/CJS path; `pkg_name_or_path` uses npm name or normalized absolute path with drive stripped.
|
||||
- **go**: (`module_path`, `package_path`, `receiver`, `func`), with receiver empty for functions.
|
||||
- **rust**: (`crate`, `module_path`, `item_name`, `mangled`)
|
||||
- **swift**: (`module`, `type`, `member`, `swift-mangled`)
|
||||
- **shell**: (`script_relpath`, `function_or_cmd`)
|
||||
- **binary**: (`binary_build_id`, `section`, `symbol_name`)
|
||||
|
||||
## nodes.ndjson
|
||||
Each line:
|
||||
```
|
||||
{
|
||||
"symbol_id": "sym:lang:...",
|
||||
"lang": "dotnet",
|
||||
"kind": "function|method|type|module|package|binary",
|
||||
"display": "Human readable name",
|
||||
"source": {
|
||||
"file": "relative/or/pkg/path",
|
||||
"line": 123,
|
||||
"col": 1,
|
||||
"digest": "sha256:<hex>"
|
||||
},
|
||||
"attributes": {
|
||||
"visibility": "public|internal|private",
|
||||
"async": true,
|
||||
"static": false,
|
||||
"generic_arity": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside `attributes` (e.g., `jvm_descriptor`, `dotnet_signature`).
|
||||
|
||||
## edges.ndjson
|
||||
Each line (static or runtime-derived; see `source`):
|
||||
```
|
||||
{
|
||||
"from": "sym:...",
|
||||
"to": "sym:...",
|
||||
"edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
|
||||
"confidence": "certain|high|medium|low",
|
||||
"source": {
|
||||
"origin": "static|runtime",
|
||||
"provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
|
||||
"evidence": "file:path:line"
|
||||
}
|
||||
}
|
||||
```
|
||||
- Ordering: primary `from`, secondary `to`, tertiary `edge_type`.
|
||||
- Duplicate edges with different provenance are allowed; consumers deduplicate by (`from`,`to`,`edge_type`,`provenance`).
|
||||
|
||||
## facts_runtime.ndjson (optional)
|
||||
Runtime-only observations attached to symbols:
|
||||
```
|
||||
{
|
||||
"symbol_id": "sym:...",
|
||||
"samples": {
|
||||
"call_count": 14,
|
||||
"first_seen_utc": "2025-11-22T18:21:12Z",
|
||||
"last_seen_utc": "2025-11-22T18:23:01Z"
|
||||
},
|
||||
"env": {
|
||||
"pid": 1234,
|
||||
"image": "sha256:...",
|
||||
"entrypoint": "main",
|
||||
"tags": ["sealed","offline"]
|
||||
}
|
||||
}
|
||||
```
|
||||
Sorting by `symbol_id`. Time fields must be UTC ISO-8601 with `Z`.
|
||||
|
||||
## meta.json
|
||||
```
|
||||
{
|
||||
"schema": "reachability-union@0.1",
|
||||
"generated_at": "2025-11-23T00:00:00Z",
|
||||
"produced_by": {
|
||||
"tool": "StellaOps.Scanner.Worker",
|
||||
"version": "0.1.0",
|
||||
"analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
|
||||
},
|
||||
"files": [
|
||||
{"path":"nodes.ndjson","sha256":"...","records":1234},
|
||||
{"path":"edges.ndjson","sha256":"...","records":4567},
|
||||
{"path":"facts_runtime.ndjson","sha256":"...","records":89}
|
||||
],
|
||||
"options": {
|
||||
"dedupe_edges": false,
|
||||
"include_runtime": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Determinism rules
|
||||
- Sort order as noted; no nulls; omit empty objects/arrays.
|
||||
- All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
|
||||
- Hash inputs use exact serialized bytes (no trailing spaces, newline `\n` only).
|
||||
|
||||
## Validation
|
||||
- JSON Schema draft 2020-12 available at `docs/reachability/runtime-static-union-schema.json` (to be generated from this spec; allowable values match enumerations above).
|
||||
- Minimal required fields: `symbol_id`, `lang`, `kind` (nodes); `from`, `to`, `edge_type`, `source.origin` (edges).
|
||||
|
||||
## Integration guidance
|
||||
- Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
|
||||
- CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
|
||||
- Consumers should treat runtime edges as additive; when both origins exist, prefer `origin=runtime` for exploitability scoring but keep static edges for coverage.
|
||||
Reference in New Issue
Block a user