10 KiB
Edge Explainability Schema
Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild.
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
1. Overview
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
- Reason code: Why this edge was detected (e.g.,
bytecode-invoke,plt-stub,indirect-target) - Confidence score: Certainty of the edge's existence
- Evidence sources: Detectors and rules that contributed to edge discovery
- Provenance: Analyzer version, detection timestamp, and input artifacts
2. Gap Resolutions
EG1: Reason Enum Governance
Standard reason codes:
| Code | Category | Description | Example |
|---|---|---|---|
bytecode-invoke |
Static | Bytecode invocation instruction | Java invokevirtual, .NET call |
bytecode-field |
Static | Field access leading to call | Static initializer |
import-symbol |
Static | Import table reference | ELF .dynsym, PE imports |
plt-stub |
Static | PLT/GOT indirection | printf@plt |
reloc-target |
Static | Relocation target | .rela.dyn entries |
indirect-target |
Heuristic | Indirect call target analysis | CFG-based |
init-array |
Static | Constructor/initializer array | .init_array, DT_INIT |
fini-array |
Static | Destructor/finalizer array | .fini_array, DT_FINI |
vtable-slot |
Heuristic | Virtual method dispatch | C++ vtable |
reflection-invoke |
Heuristic | Reflective method invocation | Method.invoke() |
runtime-observed |
Runtime | Runtime probe observation | JFR, eBPF |
user-annotated |
Manual | User-provided edge | Policy override |
Governance rules:
- New reason codes require RFC + review by Scanner Guild
- Deprecated codes remain valid for 2 major versions
- Custom codes use
custom:prefix (e.g.,custom:my-analyzer) - Codes are case-insensitive, normalized to lowercase
Code registry:
{
"schema": "stellaops.edge.reason.registry@v1",
"version": "2025-12-13",
"reasons": [
{
"code": "bytecode-invoke",
"category": "static",
"description": "Bytecode invocation instruction",
"languages": ["java", "dotnet"],
"confidence_range": [0.9, 1.0],
"deprecated": false
}
]
}
EG2: Canonical Edge Schema with Hash Rules
Edge schema:
{
"edge_id": "edge:sha256:{hex}",
"from": "sym:java:...",
"to": "sym:java:...",
"kind": "call",
"reason": "bytecode-invoke",
"confidence": 0.95,
"evidence": [
{
"source": "detector:java-bytecode-analyzer",
"rule_id": "invoke-virtual",
"rule_version": "1.0.0",
"location": {
"file": "com/example/Foo.class",
"offset": 1234,
"instruction": "invokevirtual #42"
},
"timestamp": "2025-12-13T10:00:00Z"
}
],
"attributes": {
"virtual": true,
"polymorphic_targets": 3
}
}
Hash computation:
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason
})
)
Canonicalization:
- Use only
from,to,kind,reasonfor hash (not confidence or evidence) - Sort JSON keys alphabetically
- No whitespace, UTF-8 encoding
- Hash is lowercase hex with
sha256:prefix
EG3: Evidence Limits/Redaction
Evidence limits:
| Element | Default Limit | Configurable |
|---|---|---|
| Evidence entries per edge | 10 | Yes |
| Location detail fields | 5 | Yes |
| Instruction preview length | 100 chars | Yes |
| File path depth | 10 segments | No |
Redaction rules:
| Category | Redaction | Example |
|---|---|---|
| File paths | Normalize | /home/user/... -> {PROJECT}/... |
| Bytecode offsets | Keep | Offsets are not PII |
| Instruction text | Truncate | First 100 chars |
| Source line content | Omit | Not included by default |
Truncation behavior:
{
"evidence_truncated": true,
"evidence_count": 15,
"evidence_shown": 10,
"full_evidence_uri": "cas://edges/evidence/sha256:..."
}
EG4: Confidence Rubric
Confidence scale:
| Level | Range | Description | Typical Sources |
|---|---|---|---|
certain |
1.0 | Definite edge | Direct bytecode invoke |
high |
0.85-0.99 | Very likely | Import table, PLT |
medium |
0.5-0.84 | Probable | Indirect analysis, vtable |
low |
0.2-0.49 | Possible | Heuristic carving |
unknown |
0.0-0.19 | Speculative | User annotation, fallback |
Confidence computation:
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
Base confidence by reason:
| Reason | Base Confidence |
|---|---|
bytecode-invoke |
0.98 |
import-symbol |
0.95 |
plt-stub |
0.92 |
reloc-target |
0.90 |
init-array |
0.95 |
vtable-slot |
0.75 |
indirect-target |
0.60 |
reflection-invoke |
0.50 |
runtime-observed |
0.99 |
user-annotated |
0.80 |
EG5: Detector/Rule Provenance
Provenance schema:
{
"provenance": {
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"digest": "sha256:..."
},
"detector": {
"name": "java-bytecode-analyzer",
"version": "2.0.0",
"rule_set": "default"
},
"rule": {
"id": "invoke-virtual",
"version": "1.0.0",
"description": "Detect invokevirtual bytecode instructions"
},
"input_artifacts": [
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
],
"detected_at": "2025-12-13T10:00:00Z"
}
}
Provenance requirements:
- All edges must include analyzer provenance
- Detector/rule provenance required for non-runtime edges
- Input artifact digests enable reproducibility
- Detection timestamp uses UTC ISO-8601
EG6: API/CLI Parity
API endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/api/edges/{edgeId} |
Get edge details |
GET |
/api/edges?graph_hash=... |
List edges for graph |
GET |
/api/edges/{edgeId}/evidence |
Get full evidence |
POST |
/api/edges/search |
Search edges by criteria |
CLI commands:
# List edges for a graph
stella edge list --graph blake3:a1b2c3d4...
# Get edge details
stella edge show --id edge:sha256:...
# Search edges
stella edge search --from "sym:java:..." --reason bytecode-invoke
# Export edges
stella edge export --graph blake3:... --output ./edges.ndjson
Output parity:
- API and CLI return identical JSON structure
- CLI supports
--jsonfor machine-readable output - Both support filtering by reason, confidence, from/to
EG7: Deterministic Fixtures
Fixture location:
tests/Edge/
fixtures/
bytecode-invoke.json
plt-stub.json
vtable-dispatch.json
init-array-constructor.json
runtime-observed.json
golden/
bytecode-invoke.golden.json
graph-with-edges.golden.json
datasets/edges/
schema/
edge.schema.json
reason-registry.json
samples/
java-spring-boot/
edges.ndjson
expected-hashes.txt
Fixture requirements:
- Each reason code has at least one fixture
- Fixtures include expected
edge_idhash - Golden outputs frozen after review
- CI verifies hash stability
EG8: Propagation into Explanation Graphs/VEX
Explanation graph inclusion:
{
"explanation": {
"path": [
{
"node": "sym:java:main...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:handler...",
"reason": "bytecode-invoke",
"confidence": 0.98
}
},
{
"node": "sym:java:handler...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:log4j...",
"reason": "bytecode-invoke",
"confidence": 0.95
}
}
],
"aggregate_path_confidence": 0.93
}
}
VEX evidence format:
{
"stellaops:reachability": {
"path_edges": [
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
],
"weakest_edge": {
"edge_id": "edge:sha256:...",
"reason": "bytecode-invoke",
"confidence": 0.95
},
"aggregate_confidence": 0.93
}
}
EG9: Localization Guidance
Localizable elements:
| Element | Localization | Example |
|---|---|---|
| Reason code display | Message catalog | bytecode-invoke -> "Bytecode method call" |
| Confidence level | Message catalog | high -> "High confidence" |
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
| Error messages | Message catalog | Standard error codes |
Message catalog structure:
{
"locale": "en-US",
"messages": {
"edge.reason.bytecode-invoke": "Bytecode method call",
"edge.reason.plt-stub": "PLT/GOT library call",
"edge.confidence.high": "High confidence ({0:P0})",
"edge.evidence.location": "Detected at offset {offset} in {file}"
}
}
Supported locales:
en-US(default)- Additional locales via contribution
EG10: Backfill Plan
Backfill strategy:
- Phase 1: Add reason codes to new edges (no backfill needed)
- Phase 2: Run detector upgrade on graphs without reason codes
- Phase 3: Mark old graphs as
requires_reanalysisin metadata
Migration script:
stella edge backfill --graph blake3:... --dry-run
# Output:
Graph: blake3:a1b2c3d4...
Edges without reason: 1234
Edges to update: 1234
Dry run - no changes made.
# Execute:
stella edge backfill --graph blake3:... --execute
Backfill metadata:
{
"backfill": {
"status": "complete",
"original_analyzer_version": "1.0.0",
"backfill_analyzer_version": "1.2.0",
"backfilled_at": "2025-12-13T10:00:00Z",
"edges_updated": 1234
}
}
3. Related Documentation
- richgraph-v1 Contract - Graph schema specification
- Function-Level Evidence - Evidence chain guide
- Explainability Schema - Explanation format
- Hybrid Attestation - Edge bundle DSSE
Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history.