# Edge Explainability Schema

_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._

This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.

---

## 1. Overview

Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:

- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
- **Confidence score:** Certainty of the edge's existence
- **Evidence sources:** Detectors and rules that contributed to edge discovery
- **Provenance:** Analyzer version, detection timestamp, and input artifacts

---

## 2. Gap Resolutions

### EG1: Reason Enum Governance

**Standard reason codes:**

| Code | Category | Description | Example |
|------|----------|-------------|---------|
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
| `bytecode-field` | Static | Field access leading to call | Static initializer |
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
| `user-annotated` | Manual | User-provided edge | Policy override |

**Governance rules:**

1. New reason codes require RFC + review by Scanner Guild
2. Deprecated codes remain valid for 2 major versions
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
4. Codes are case-insensitive, normalized to lowercase

**Code registry:**

```json
{
  "schema": "stellaops.edge.reason.registry@v1",
  "version": "2025-12-13",
  "reasons": [
    {
      "code": "bytecode-invoke",
      "category": "static",
      "description": "Bytecode invocation instruction",
      "languages": ["java", "dotnet"],
      "confidence_range": [0.9, 1.0],
      "deprecated": false
    }
  ]
}
```

### EG2: Canonical Edge Schema with Hash Rules

**Edge schema:**

```json
{
  "edge_id": "edge:sha256:{hex}",
  "from": "sym:java:...",
  "to": "sym:java:...",
  "kind": "call",
  "reason": "bytecode-invoke",
  "confidence": 0.95,
  "evidence": [
    {
      "source": "detector:java-bytecode-analyzer",
      "rule_id": "invoke-virtual",
      "rule_version": "1.0.0",
      "location": {
        "file": "com/example/Foo.class",
        "offset": 1234,
        "instruction": "invokevirtual #42"
      },
      "timestamp": "2025-12-13T10:00:00Z"
    }
  ],
  "attributes": {
    "virtual": true,
    "polymorphic_targets": 3
  }
}
```

**Hash computation:**

```
edge_id = "edge:" + sha256(
  canonical_json({
    "from": edge.from,
    "to": edge.to,
    "kind": edge.kind,
    "reason": edge.reason
  })
)
```

**Canonicalization:**

1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
2. Sort JSON keys alphabetically
3. No whitespace, UTF-8 encoding
4. Hash is lowercase hex with `sha256:` prefix

### EG3: Evidence Limits/Redaction

**Evidence limits:**

| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Evidence entries per edge | 10 | Yes |
| Location detail fields | 5 | Yes |
| Instruction preview length | 100 chars | Yes |
| File path depth | 10 segments | No |

**Redaction rules:**

| Category | Redaction | Example |
|----------|-----------|---------|
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
| Bytecode offsets | Keep | Offsets are not PII |
| Instruction text | Truncate | First 100 chars |
| Source line content | Omit | Not included by default |

**Truncation behavior:**

```json
{
  "evidence_truncated": true,
  "evidence_count": 15,
  "evidence_shown": 10,
  "full_evidence_uri": "cas://edges/evidence/sha256:..."
}
```

### EG4: Confidence Rubric

**Confidence scale:**

| Level | Range | Description | Typical Sources |
|-------|-------|-------------|-----------------|
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
| `low` | 0.2-0.49 | Possible | Heuristic carving |
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |

**Confidence computation:**

```
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
```

**Base confidence by reason:**

| Reason | Base Confidence |
|--------|-----------------|
| `bytecode-invoke` | 0.98 |
| `import-symbol` | 0.95 |
| `plt-stub` | 0.92 |
| `reloc-target` | 0.90 |
| `init-array` | 0.95 |
| `vtable-slot` | 0.75 |
| `indirect-target` | 0.60 |
| `reflection-invoke` | 0.50 |
| `runtime-observed` | 0.99 |
| `user-annotated` | 0.80 |

### EG5: Detector/Rule Provenance

**Provenance schema:**

```json
{
  "provenance": {
    "analyzer": {
      "name": "scanner.java",
      "version": "1.2.0",
      "digest": "sha256:..."
    },
    "detector": {
      "name": "java-bytecode-analyzer",
      "version": "2.0.0",
      "rule_set": "default"
    },
    "rule": {
      "id": "invoke-virtual",
      "version": "1.0.0",
      "description": "Detect invokevirtual bytecode instructions"
    },
    "input_artifacts": [
      {"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
    ],
    "detected_at": "2025-12-13T10:00:00Z"
  }
}
```

**Provenance requirements:**

1. All edges must include analyzer provenance
2. Detector/rule provenance required for non-runtime edges
3. Input artifact digests enable reproducibility
4. Detection timestamp uses UTC ISO-8601

### EG6: API/CLI Parity

**API endpoints:**

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/edges/{edgeId}` | Get edge details |
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
| `POST` | `/api/edges/search` | Search edges by criteria |

**CLI commands:**

```bash
# List edges for a graph
stella edge list --graph blake3:a1b2c3d4...

# Get edge details
stella edge show --id edge:sha256:...

# Search edges
stella edge search --from "sym:java:..." --reason bytecode-invoke

# Export edges
stella edge export --graph blake3:... --output ./edges.ndjson
```

**Output parity:**

- API and CLI return identical JSON structure
- CLI supports `--json` for machine-readable output
- Both support filtering by reason, confidence, from/to

### EG7: Deterministic Fixtures

**Fixture location:**

```
tests/Edge/
  fixtures/
    bytecode-invoke.json
    plt-stub.json
    vtable-dispatch.json
    init-array-constructor.json
    runtime-observed.json
  golden/
    bytecode-invoke.golden.json
    graph-with-edges.golden.json

datasets/edges/
  schema/
    edge.schema.json
    reason-registry.json
  samples/
    java-spring-boot/
      edges.ndjson
      expected-hashes.txt
```

**Fixture requirements:**

1. Each reason code has at least one fixture
2. Fixtures include expected `edge_id` hash
3. Golden outputs frozen after review
4. CI verifies hash stability

### EG8: Propagation into Explanation Graphs/VEX

**Explanation graph inclusion:**

```json
{
  "explanation": {
    "path": [
      {
        "node": "sym:java:main...",
        "outgoing_edge": {
          "edge_id": "edge:sha256:...",
          "to": "sym:java:handler...",
          "reason": "bytecode-invoke",
          "confidence": 0.98
        }
      },
      {
        "node": "sym:java:handler...",
        "outgoing_edge": {
          "edge_id": "edge:sha256:...",
          "to": "sym:java:log4j...",
          "reason": "bytecode-invoke",
          "confidence": 0.95
        }
      }
    ],
    "aggregate_path_confidence": 0.93
  }
}
```

**VEX evidence format:**

```json
{
  "stellaops:reachability": {
    "path_edges": [
      {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
      {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
    ],
    "weakest_edge": {
      "edge_id": "edge:sha256:...",
      "reason": "bytecode-invoke",
      "confidence": 0.95
    },
    "aggregate_confidence": 0.93
  }
}
```

### EG9: Localization Guidance

**Localizable elements:**

| Element | Localization | Example |
|---------|--------------|---------|
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
| Confidence level | Message catalog | `high` -> "High confidence" |
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
| Error messages | Message catalog | Standard error codes |

**Message catalog structure:**

```json
{
  "locale": "en-US",
  "messages": {
    "edge.reason.bytecode-invoke": "Bytecode method call",
    "edge.reason.plt-stub": "PLT/GOT library call",
    "edge.confidence.high": "High confidence ({0:P0})",
    "edge.evidence.location": "Detected at offset {offset} in {file}"
  }
}
```

**Supported locales:**

- `en-US` (default)
- Additional locales via contribution

### EG10: Backfill Plan

**Backfill strategy:**

1. **Phase 1:** Add reason codes to new edges (no backfill needed)
2. **Phase 2:** Run detector upgrade on graphs without reason codes
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata

**Migration script:**

```bash
stella edge backfill --graph blake3:... --dry-run

# Output:
Graph: blake3:a1b2c3d4...
Edges without reason: 1234
Edges to update: 1234

Dry run - no changes made.

# Execute:
stella edge backfill --graph blake3:... --execute
```

**Backfill metadata:**

```json
{
  "backfill": {
    "status": "complete",
    "original_analyzer_version": "1.0.0",
    "backfill_analyzer_version": "1.2.0",
    "backfilled_at": "2025-12-13T10:00:00Z",
    "edges_updated": 1234
  }
}
```

---

## 3. Related Documentation

- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Explainability Schema](./explainability-schema.md) - Explanation format
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE

---

_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._