up

2025-12-13 09:37:15 +02:00
parent e00f6365da
commit 6e45066e37
349 changed files with 17160 additions and 1867 deletions
--- a/docs/reachability/edge-explainability-schema.md
+++ b/docs/reachability/edge-explainability-schema.md
@@ -0,0 +1,416 @@
+# Edge Explainability Schema
+
+_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
+
+This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
+
+---
+
+## 1. Overview
+
+Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
+
+- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
+- **Confidence score:** Certainty of the edge's existence
+- **Evidence sources:** Detectors and rules that contributed to edge discovery
+- **Provenance:** Analyzer version, detection timestamp, and input artifacts
+
+---
+
+## 2. Gap Resolutions
+
+### EG1: Reason Enum Governance
+
+**Standard reason codes:**
+
+| Code | Category | Description | Example |
+|------|----------|-------------|---------|
+| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
+| `bytecode-field` | Static | Field access leading to call | Static initializer |
+| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
+| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
+| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
+| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
+| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
+| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
+| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
+| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
+| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
+| `user-annotated` | Manual | User-provided edge | Policy override |
+
+**Governance rules:**
+
+1. New reason codes require RFC + review by Scanner Guild
+2. Deprecated codes remain valid for 2 major versions
+3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
+4. Codes are case-insensitive, normalized to lowercase
+
+**Code registry:**
+
+```json
+{
+  "schema": "stellaops.edge.reason.registry@v1",
+  "version": "2025-12-13",
+  "reasons": [
+    {
+      "code": "bytecode-invoke",
+      "category": "static",
+      "description": "Bytecode invocation instruction",
+      "languages": ["java", "dotnet"],
+      "confidence_range": [0.9, 1.0],
+      "deprecated": false
+    }
+  ]
+}
+```
+
+### EG2: Canonical Edge Schema with Hash Rules
+
+**Edge schema:**
+
+```json
+{
+  "edge_id": "edge:sha256:{hex}",
+  "from": "sym:java:...",
+  "to": "sym:java:...",
+  "kind": "call",
+  "reason": "bytecode-invoke",
+  "confidence": 0.95,
+  "evidence": [
+    {
+      "source": "detector:java-bytecode-analyzer",
+      "rule_id": "invoke-virtual",
+      "rule_version": "1.0.0",
+      "location": {
+        "file": "com/example/Foo.class",
+        "offset": 1234,
+        "instruction": "invokevirtual #42"
+      },
+      "timestamp": "2025-12-13T10:00:00Z"
+    }
+  ],
+  "attributes": {
+    "virtual": true,
+    "polymorphic_targets": 3
+  }
+}
+```
+
+**Hash computation:**
+
+```
+edge_id = "edge:" + sha256(
+  canonical_json({
+    "from": edge.from,
+    "to": edge.to,
+    "kind": edge.kind,
+    "reason": edge.reason
+  })
+)
+```
+
+**Canonicalization:**
+
+1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
+2. Sort JSON keys alphabetically
+3. No whitespace, UTF-8 encoding
+4. Hash is lowercase hex with `sha256:` prefix
+
+### EG3: Evidence Limits/Redaction
+
+**Evidence limits:**
+
+| Element | Default Limit | Configurable |
+|---------|--------------|--------------|
+| Evidence entries per edge | 10 | Yes |
+| Location detail fields | 5 | Yes |
+| Instruction preview length | 100 chars | Yes |
+| File path depth | 10 segments | No |
+
+**Redaction rules:**
+
+| Category | Redaction | Example |
+|----------|-----------|---------|
+| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
+| Bytecode offsets | Keep | Offsets are not PII |
+| Instruction text | Truncate | First 100 chars |
+| Source line content | Omit | Not included by default |
+
+**Truncation behavior:**
+
+```json
+{
+  "evidence_truncated": true,
+  "evidence_count": 15,
+  "evidence_shown": 10,
+  "full_evidence_uri": "cas://edges/evidence/sha256:..."
+}
+```
+
+### EG4: Confidence Rubric
+
+**Confidence scale:**
+
+| Level | Range | Description | Typical Sources |
+|-------|-------|-------------|-----------------|
+| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
+| `high` | 0.85-0.99 | Very likely | Import table, PLT |
+| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
+| `low` | 0.2-0.49 | Possible | Heuristic carving |
+| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
+
+**Confidence computation:**
+
+```
+edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
+```
+
+**Base confidence by reason:**
+
+| Reason | Base Confidence |
+|--------|-----------------|
+| `bytecode-invoke` | 0.98 |
+| `import-symbol` | 0.95 |
+| `plt-stub` | 0.92 |
+| `reloc-target` | 0.90 |
+| `init-array` | 0.95 |
+| `vtable-slot` | 0.75 |
+| `indirect-target` | 0.60 |
+| `reflection-invoke` | 0.50 |
+| `runtime-observed` | 0.99 |
+| `user-annotated` | 0.80 |
+
+### EG5: Detector/Rule Provenance
+
+**Provenance schema:**
+
+```json
+{
+  "provenance": {
+    "analyzer": {
+      "name": "scanner.java",
+      "version": "1.2.0",
+      "digest": "sha256:..."
+    },
+    "detector": {
+      "name": "java-bytecode-analyzer",
+      "version": "2.0.0",
+      "rule_set": "default"
+    },
+    "rule": {
+      "id": "invoke-virtual",
+      "version": "1.0.0",
+      "description": "Detect invokevirtual bytecode instructions"
+    },
+    "input_artifacts": [
+      {"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
+    ],
+    "detected_at": "2025-12-13T10:00:00Z"
+  }
+}
+```
+
+**Provenance requirements:**
+
+1. All edges must include analyzer provenance
+2. Detector/rule provenance required for non-runtime edges
+3. Input artifact digests enable reproducibility
+4. Detection timestamp uses UTC ISO-8601
+
+### EG6: API/CLI Parity
+
+**API endpoints:**
+
+| Method | Path | Description |
+|--------|------|-------------|
+| `GET` | `/api/edges/{edgeId}` | Get edge details |
+| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
+| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
+| `POST` | `/api/edges/search` | Search edges by criteria |
+
+**CLI commands:**
+
+```bash
+# List edges for a graph
+stella edge list --graph blake3:a1b2c3d4...
+
+# Get edge details
+stella edge show --id edge:sha256:...
+
+# Search edges
+stella edge search --from "sym:java:..." --reason bytecode-invoke
+
+# Export edges
+stella edge export --graph blake3:... --output ./edges.ndjson
+```
+
+**Output parity:**
+
+- API and CLI return identical JSON structure
+- CLI supports `--json` for machine-readable output
+- Both support filtering by reason, confidence, from/to
+
+### EG7: Deterministic Fixtures
+
+**Fixture location:**
+
+```
+tests/Edge/
+  fixtures/
+    bytecode-invoke.json
+    plt-stub.json
+    vtable-dispatch.json
+    init-array-constructor.json
+    runtime-observed.json
+  golden/
+    bytecode-invoke.golden.json
+    graph-with-edges.golden.json
+
+datasets/edges/
+  schema/
+    edge.schema.json
+    reason-registry.json
+  samples/
+    java-spring-boot/
+      edges.ndjson
+      expected-hashes.txt
+```
+
+**Fixture requirements:**
+
+1. Each reason code has at least one fixture
+2. Fixtures include expected `edge_id` hash
+3. Golden outputs frozen after review
+4. CI verifies hash stability
+
+### EG8: Propagation into Explanation Graphs/VEX
+
+**Explanation graph inclusion:**
+
+```json
+{
+  "explanation": {
+    "path": [
+      {
+        "node": "sym:java:main...",
+        "outgoing_edge": {
+          "edge_id": "edge:sha256:...",
+          "to": "sym:java:handler...",
+          "reason": "bytecode-invoke",
+          "confidence": 0.98
+        }
+      },
+      {
+        "node": "sym:java:handler...",
+        "outgoing_edge": {
+          "edge_id": "edge:sha256:...",
+          "to": "sym:java:log4j...",
+          "reason": "bytecode-invoke",
+          "confidence": 0.95
+        }
+      }
+    ],
+    "aggregate_path_confidence": 0.93
+  }
+}
+```
+
+**VEX evidence format:**
+
+```json
+{
+  "stellaops:reachability": {
+    "path_edges": [
+      {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
+      {"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
+    ],
+    "weakest_edge": {
+      "edge_id": "edge:sha256:...",
+      "reason": "bytecode-invoke",
+      "confidence": 0.95
+    },
+    "aggregate_confidence": 0.93
+  }
+}
+```
+
+### EG9: Localization Guidance
+
+**Localizable elements:**
+
+| Element | Localization | Example |
+|---------|--------------|---------|
+| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
+| Confidence level | Message catalog | `high` -> "High confidence" |
+| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
+| Error messages | Message catalog | Standard error codes |
+
+**Message catalog structure:**
+
+```json
+{
+  "locale": "en-US",
+  "messages": {
+    "edge.reason.bytecode-invoke": "Bytecode method call",
+    "edge.reason.plt-stub": "PLT/GOT library call",
+    "edge.confidence.high": "High confidence ({0:P0})",
+    "edge.evidence.location": "Detected at offset {offset} in {file}"
+  }
+}
+```
+
+**Supported locales:**
+
+- `en-US` (default)
+- Additional locales via contribution
+
+### EG10: Backfill Plan
+
+**Backfill strategy:**
+
+1. **Phase 1:** Add reason codes to new edges (no backfill needed)
+2. **Phase 2:** Run detector upgrade on graphs without reason codes
+3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
+
+**Migration script:**
+
+```bash
+stella edge backfill --graph blake3:... --dry-run
+
+# Output:
+Graph: blake3:a1b2c3d4...
+Edges without reason: 1234
+Edges to update: 1234
+
+Dry run - no changes made.
+
+# Execute:
+stella edge backfill --graph blake3:... --execute
+```
+
+**Backfill metadata:**
+
+```json
+{
+  "backfill": {
+    "status": "complete",
+    "original_analyzer_version": "1.0.0",
+    "backfill_analyzer_version": "1.2.0",
+    "backfilled_at": "2025-12-13T10:00:00Z",
+    "edges_updated": 1234
+  }
+}
+```
+
+---
+
+## 3. Related Documentation
+
+- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
+- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
+- [Explainability Schema](./explainability-schema.md) - Explanation format
+- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
+
+---
+
+_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._