# Edge Explainability Schema _Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._ This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX. --- ## 1. Overview Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes: - **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`) - **Confidence score:** Certainty of the edge's existence - **Evidence sources:** Detectors and rules that contributed to edge discovery - **Provenance:** Analyzer version, detection timestamp, and input artifacts --- ## 2. Gap Resolutions ### EG1: Reason Enum Governance **Standard reason codes:** | Code | Category | Description | Example | |------|----------|-------------|---------| | `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` | | `bytecode-field` | Static | Field access leading to call | Static initializer | | `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports | | `plt-stub` | Static | PLT/GOT indirection | `printf@plt` | | `reloc-target` | Static | Relocation target | `.rela.dyn` entries | | `indirect-target` | Heuristic | Indirect call target analysis | CFG-based | | `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` | | `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` | | `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable | | `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` | | `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF | | `user-annotated` | Manual | User-provided edge | Policy override | **Governance rules:** 1. New reason codes require RFC + review by Scanner Guild 2. Deprecated codes remain valid for 2 major versions 3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`) 4. Codes are case-insensitive, normalized to lowercase **Code registry:** ```json { "schema": "stellaops.edge.reason.registry@v1", "version": "2025-12-13", "reasons": [ { "code": "bytecode-invoke", "category": "static", "description": "Bytecode invocation instruction", "languages": ["java", "dotnet"], "confidence_range": [0.9, 1.0], "deprecated": false } ] } ``` ### EG2: Canonical Edge Schema with Hash Rules **Edge schema:** ```json { "edge_id": "edge:sha256:{hex}", "from": "sym:java:...", "to": "sym:java:...", "kind": "call", "reason": "bytecode-invoke", "confidence": 0.95, "evidence": [ { "source": "detector:java-bytecode-analyzer", "rule_id": "invoke-virtual", "rule_version": "1.0.0", "location": { "file": "com/example/Foo.class", "offset": 1234, "instruction": "invokevirtual #42" }, "timestamp": "2025-12-13T10:00:00Z" } ], "attributes": { "virtual": true, "polymorphic_targets": 3 } } ``` **Hash computation:** ``` edge_id = "edge:" + sha256( canonical_json({ "from": edge.from, "to": edge.to, "kind": edge.kind, "reason": edge.reason }) ) ``` **Canonicalization:** 1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence) 2. Sort JSON keys alphabetically 3. No whitespace, UTF-8 encoding 4. Hash is lowercase hex with `sha256:` prefix ### EG3: Evidence Limits/Redaction **Evidence limits:** | Element | Default Limit | Configurable | |---------|--------------|--------------| | Evidence entries per edge | 10 | Yes | | Location detail fields | 5 | Yes | | Instruction preview length | 100 chars | Yes | | File path depth | 10 segments | No | **Redaction rules:** | Category | Redaction | Example | |----------|-----------|---------| | File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` | | Bytecode offsets | Keep | Offsets are not PII | | Instruction text | Truncate | First 100 chars | | Source line content | Omit | Not included by default | **Truncation behavior:** ```json { "evidence_truncated": true, "evidence_count": 15, "evidence_shown": 10, "full_evidence_uri": "cas://edges/evidence/sha256:..." } ``` ### EG4: Confidence Rubric **Confidence scale:** | Level | Range | Description | Typical Sources | |-------|-------|-------------|-----------------| | `certain` | 1.0 | Definite edge | Direct bytecode invoke | | `high` | 0.85-0.99 | Very likely | Import table, PLT | | `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable | | `low` | 0.2-0.49 | Possible | Heuristic carving | | `unknown` | 0.0-0.19 | Speculative | User annotation, fallback | **Confidence computation:** ``` edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor ``` **Base confidence by reason:** | Reason | Base Confidence | |--------|-----------------| | `bytecode-invoke` | 0.98 | | `import-symbol` | 0.95 | | `plt-stub` | 0.92 | | `reloc-target` | 0.90 | | `init-array` | 0.95 | | `vtable-slot` | 0.75 | | `indirect-target` | 0.60 | | `reflection-invoke` | 0.50 | | `runtime-observed` | 0.99 | | `user-annotated` | 0.80 | ### EG5: Detector/Rule Provenance **Provenance schema:** ```json { "provenance": { "analyzer": { "name": "scanner.java", "version": "1.2.0", "digest": "sha256:..." }, "detector": { "name": "java-bytecode-analyzer", "version": "2.0.0", "rule_set": "default" }, "rule": { "id": "invoke-virtual", "version": "1.0.0", "description": "Detect invokevirtual bytecode instructions" }, "input_artifacts": [ {"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"} ], "detected_at": "2025-12-13T10:00:00Z" } } ``` **Provenance requirements:** 1. All edges must include analyzer provenance 2. Detector/rule provenance required for non-runtime edges 3. Input artifact digests enable reproducibility 4. Detection timestamp uses UTC ISO-8601 ### EG6: API/CLI Parity **API endpoints:** | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/edges/{edgeId}` | Get edge details | | `GET` | `/api/edges?graph_hash=...` | List edges for graph | | `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence | | `POST` | `/api/edges/search` | Search edges by criteria | **CLI commands:** ```bash # List edges for a graph stella edge list --graph blake3:a1b2c3d4... # Get edge details stella edge show --id edge:sha256:... # Search edges stella edge search --from "sym:java:..." --reason bytecode-invoke # Export edges stella edge export --graph blake3:... --output ./edges.ndjson ``` **Output parity:** - API and CLI return identical JSON structure - CLI supports `--json` for machine-readable output - Both support filtering by reason, confidence, from/to ### EG7: Deterministic Fixtures **Fixture location:** ``` tests/Edge/ fixtures/ bytecode-invoke.json plt-stub.json vtable-dispatch.json init-array-constructor.json runtime-observed.json golden/ bytecode-invoke.golden.json graph-with-edges.golden.json datasets/edges/ schema/ edge.schema.json reason-registry.json samples/ java-spring-boot/ edges.ndjson expected-hashes.txt ``` **Fixture requirements:** 1. Each reason code has at least one fixture 2. Fixtures include expected `edge_id` hash 3. Golden outputs frozen after review 4. CI verifies hash stability ### EG8: Propagation into Explanation Graphs/VEX **Explanation graph inclusion:** ```json { "explanation": { "path": [ { "node": "sym:java:main...", "outgoing_edge": { "edge_id": "edge:sha256:...", "to": "sym:java:handler...", "reason": "bytecode-invoke", "confidence": 0.98 } }, { "node": "sym:java:handler...", "outgoing_edge": { "edge_id": "edge:sha256:...", "to": "sym:java:log4j...", "reason": "bytecode-invoke", "confidence": 0.95 } } ], "aggregate_path_confidence": 0.93 } } ``` **VEX evidence format:** ```json { "stellaops:reachability": { "path_edges": [ {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98}, {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95} ], "weakest_edge": { "edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95 }, "aggregate_confidence": 0.93 } } ``` ### EG9: Localization Guidance **Localizable elements:** | Element | Localization | Example | |---------|--------------|---------| | Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" | | Confidence level | Message catalog | `high` -> "High confidence" | | Evidence descriptions | Template | "Detected at offset {offset} in {file}" | | Error messages | Message catalog | Standard error codes | **Message catalog structure:** ```json { "locale": "en-US", "messages": { "edge.reason.bytecode-invoke": "Bytecode method call", "edge.reason.plt-stub": "PLT/GOT library call", "edge.confidence.high": "High confidence ({0:P0})", "edge.evidence.location": "Detected at offset {offset} in {file}" } } ``` **Supported locales:** - `en-US` (default) - Additional locales via contribution ### EG10: Backfill Plan **Backfill strategy:** 1. **Phase 1:** Add reason codes to new edges (no backfill needed) 2. **Phase 2:** Run detector upgrade on graphs without reason codes 3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata **Migration script:** ```bash stella edge backfill --graph blake3:... --dry-run # Output: Graph: blake3:a1b2c3d4... Edges without reason: 1234 Edges to update: 1234 Dry run - no changes made. # Execute: stella edge backfill --graph blake3:... --execute ``` **Backfill metadata:** ```json { "backfill": { "status": "complete", "original_analyzer_version": "1.0.0", "backfill_analyzer_version": "1.2.0", "backfilled_at": "2025-12-13T10:00:00Z", "edges_updated": 1234 } } ``` --- ## 3. Related Documentation - [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification - [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide - [Explainability Schema](./explainability-schema.md) - Explanation format - [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE --- _Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._