Files
git.stella-ops.org/docs/reachability/edge-explainability-schema.md
StellaOps Bot 6e45066e37
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
up
2025-12-13 09:37:15 +02:00

10 KiB

Edge Explainability Schema

Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild.

This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.


1. Overview

Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:

  • Reason code: Why this edge was detected (e.g., bytecode-invoke, plt-stub, indirect-target)
  • Confidence score: Certainty of the edge's existence
  • Evidence sources: Detectors and rules that contributed to edge discovery
  • Provenance: Analyzer version, detection timestamp, and input artifacts

2. Gap Resolutions

EG1: Reason Enum Governance

Standard reason codes:

Code Category Description Example
bytecode-invoke Static Bytecode invocation instruction Java invokevirtual, .NET call
bytecode-field Static Field access leading to call Static initializer
import-symbol Static Import table reference ELF .dynsym, PE imports
plt-stub Static PLT/GOT indirection printf@plt
reloc-target Static Relocation target .rela.dyn entries
indirect-target Heuristic Indirect call target analysis CFG-based
init-array Static Constructor/initializer array .init_array, DT_INIT
fini-array Static Destructor/finalizer array .fini_array, DT_FINI
vtable-slot Heuristic Virtual method dispatch C++ vtable
reflection-invoke Heuristic Reflective method invocation Method.invoke()
runtime-observed Runtime Runtime probe observation JFR, eBPF
user-annotated Manual User-provided edge Policy override

Governance rules:

  1. New reason codes require RFC + review by Scanner Guild
  2. Deprecated codes remain valid for 2 major versions
  3. Custom codes use custom: prefix (e.g., custom:my-analyzer)
  4. Codes are case-insensitive, normalized to lowercase

Code registry:

{
  "schema": "stellaops.edge.reason.registry@v1",
  "version": "2025-12-13",
  "reasons": [
    {
      "code": "bytecode-invoke",
      "category": "static",
      "description": "Bytecode invocation instruction",
      "languages": ["java", "dotnet"],
      "confidence_range": [0.9, 1.0],
      "deprecated": false
    }
  ]
}

EG2: Canonical Edge Schema with Hash Rules

Edge schema:

{
  "edge_id": "edge:sha256:{hex}",
  "from": "sym:java:...",
  "to": "sym:java:...",
  "kind": "call",
  "reason": "bytecode-invoke",
  "confidence": 0.95,
  "evidence": [
    {
      "source": "detector:java-bytecode-analyzer",
      "rule_id": "invoke-virtual",
      "rule_version": "1.0.0",
      "location": {
        "file": "com/example/Foo.class",
        "offset": 1234,
        "instruction": "invokevirtual #42"
      },
      "timestamp": "2025-12-13T10:00:00Z"
    }
  ],
  "attributes": {
    "virtual": true,
    "polymorphic_targets": 3
  }
}

Hash computation:

edge_id = "edge:" + sha256(
  canonical_json({
    "from": edge.from,
    "to": edge.to,
    "kind": edge.kind,
    "reason": edge.reason
  })
)

Canonicalization:

  1. Use only from, to, kind, reason for hash (not confidence or evidence)
  2. Sort JSON keys alphabetically
  3. No whitespace, UTF-8 encoding
  4. Hash is lowercase hex with sha256: prefix

EG3: Evidence Limits/Redaction

Evidence limits:

Element Default Limit Configurable
Evidence entries per edge 10 Yes
Location detail fields 5 Yes
Instruction preview length 100 chars Yes
File path depth 10 segments No

Redaction rules:

Category Redaction Example
File paths Normalize /home/user/... -> {PROJECT}/...
Bytecode offsets Keep Offsets are not PII
Instruction text Truncate First 100 chars
Source line content Omit Not included by default

Truncation behavior:

{
  "evidence_truncated": true,
  "evidence_count": 15,
  "evidence_shown": 10,
  "full_evidence_uri": "cas://edges/evidence/sha256:..."
}

EG4: Confidence Rubric

Confidence scale:

Level Range Description Typical Sources
certain 1.0 Definite edge Direct bytecode invoke
high 0.85-0.99 Very likely Import table, PLT
medium 0.5-0.84 Probable Indirect analysis, vtable
low 0.2-0.49 Possible Heuristic carving
unknown 0.0-0.19 Speculative User annotation, fallback

Confidence computation:

edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor

Base confidence by reason:

Reason Base Confidence
bytecode-invoke 0.98
import-symbol 0.95
plt-stub 0.92
reloc-target 0.90
init-array 0.95
vtable-slot 0.75
indirect-target 0.60
reflection-invoke 0.50
runtime-observed 0.99
user-annotated 0.80

EG5: Detector/Rule Provenance

Provenance schema:

{
  "provenance": {
    "analyzer": {
      "name": "scanner.java",
      "version": "1.2.0",
      "digest": "sha256:..."
    },
    "detector": {
      "name": "java-bytecode-analyzer",
      "version": "2.0.0",
      "rule_set": "default"
    },
    "rule": {
      "id": "invoke-virtual",
      "version": "1.0.0",
      "description": "Detect invokevirtual bytecode instructions"
    },
    "input_artifacts": [
      {"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
    ],
    "detected_at": "2025-12-13T10:00:00Z"
  }
}

Provenance requirements:

  1. All edges must include analyzer provenance
  2. Detector/rule provenance required for non-runtime edges
  3. Input artifact digests enable reproducibility
  4. Detection timestamp uses UTC ISO-8601

EG6: API/CLI Parity

API endpoints:

Method Path Description
GET /api/edges/{edgeId} Get edge details
GET /api/edges?graph_hash=... List edges for graph
GET /api/edges/{edgeId}/evidence Get full evidence
POST /api/edges/search Search edges by criteria

CLI commands:

# List edges for a graph
stella edge list --graph blake3:a1b2c3d4...

# Get edge details
stella edge show --id edge:sha256:...

# Search edges
stella edge search --from "sym:java:..." --reason bytecode-invoke

# Export edges
stella edge export --graph blake3:... --output ./edges.ndjson

Output parity:

  • API and CLI return identical JSON structure
  • CLI supports --json for machine-readable output
  • Both support filtering by reason, confidence, from/to

EG7: Deterministic Fixtures

Fixture location:

tests/Edge/
  fixtures/
    bytecode-invoke.json
    plt-stub.json
    vtable-dispatch.json
    init-array-constructor.json
    runtime-observed.json
  golden/
    bytecode-invoke.golden.json
    graph-with-edges.golden.json

datasets/edges/
  schema/
    edge.schema.json
    reason-registry.json
  samples/
    java-spring-boot/
      edges.ndjson
      expected-hashes.txt

Fixture requirements:

  1. Each reason code has at least one fixture
  2. Fixtures include expected edge_id hash
  3. Golden outputs frozen after review
  4. CI verifies hash stability

EG8: Propagation into Explanation Graphs/VEX

Explanation graph inclusion:

{
  "explanation": {
    "path": [
      {
        "node": "sym:java:main...",
        "outgoing_edge": {
          "edge_id": "edge:sha256:...",
          "to": "sym:java:handler...",
          "reason": "bytecode-invoke",
          "confidence": 0.98
        }
      },
      {
        "node": "sym:java:handler...",
        "outgoing_edge": {
          "edge_id": "edge:sha256:...",
          "to": "sym:java:log4j...",
          "reason": "bytecode-invoke",
          "confidence": 0.95
        }
      }
    ],
    "aggregate_path_confidence": 0.93
  }
}

VEX evidence format:

{
  "stellaops:reachability": {
    "path_edges": [
      {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
      {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
    ],
    "weakest_edge": {
      "edge_id": "edge:sha256:...",
      "reason": "bytecode-invoke",
      "confidence": 0.95
    },
    "aggregate_confidence": 0.93
  }
}

EG9: Localization Guidance

Localizable elements:

Element Localization Example
Reason code display Message catalog bytecode-invoke -> "Bytecode method call"
Confidence level Message catalog high -> "High confidence"
Evidence descriptions Template "Detected at offset {offset} in {file}"
Error messages Message catalog Standard error codes

Message catalog structure:

{
  "locale": "en-US",
  "messages": {
    "edge.reason.bytecode-invoke": "Bytecode method call",
    "edge.reason.plt-stub": "PLT/GOT library call",
    "edge.confidence.high": "High confidence ({0:P0})",
    "edge.evidence.location": "Detected at offset {offset} in {file}"
  }
}

Supported locales:

  • en-US (default)
  • Additional locales via contribution

EG10: Backfill Plan

Backfill strategy:

  1. Phase 1: Add reason codes to new edges (no backfill needed)
  2. Phase 2: Run detector upgrade on graphs without reason codes
  3. Phase 3: Mark old graphs as requires_reanalysis in metadata

Migration script:

stella edge backfill --graph blake3:... --dry-run

# Output:
Graph: blake3:a1b2c3d4...
Edges without reason: 1234
Edges to update: 1234

Dry run - no changes made.

# Execute:
stella edge backfill --graph blake3:... --execute

Backfill metadata:

{
  "backfill": {
    "status": "complete",
    "original_analyzer_version": "1.0.0",
    "backfill_analyzer_version": "1.2.0",
    "backfilled_at": "2025-12-13T10:00:00Z",
    "edges_updated": 1234
  }
}


Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history.