up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
This commit is contained in:
416
docs/reachability/edge-explainability-schema.md
Normal file
416
docs/reachability/edge-explainability-schema.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Edge Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
|
||||
|
||||
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
|
||||
|
||||
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
|
||||
- **Confidence score:** Certainty of the edge's existence
|
||||
- **Evidence sources:** Detectors and rules that contributed to edge discovery
|
||||
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EG1: Reason Enum Governance
|
||||
|
||||
**Standard reason codes:**
|
||||
|
||||
| Code | Category | Description | Example |
|
||||
|------|----------|-------------|---------|
|
||||
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
|
||||
| `bytecode-field` | Static | Field access leading to call | Static initializer |
|
||||
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
|
||||
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
|
||||
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
|
||||
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
|
||||
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
|
||||
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
|
||||
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
|
||||
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
|
||||
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
|
||||
| `user-annotated` | Manual | User-provided edge | Policy override |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. New reason codes require RFC + review by Scanner Guild
|
||||
2. Deprecated codes remain valid for 2 major versions
|
||||
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
|
||||
4. Codes are case-insensitive, normalized to lowercase
|
||||
|
||||
**Code registry:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.edge.reason.registry@v1",
|
||||
"version": "2025-12-13",
|
||||
"reasons": [
|
||||
{
|
||||
"code": "bytecode-invoke",
|
||||
"category": "static",
|
||||
"description": "Bytecode invocation instruction",
|
||||
"languages": ["java", "dotnet"],
|
||||
"confidence_range": [0.9, 1.0],
|
||||
"deprecated": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### EG2: Canonical Edge Schema with Hash Rules
|
||||
|
||||
**Edge schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"edge_id": "edge:sha256:{hex}",
|
||||
"from": "sym:java:...",
|
||||
"to": "sym:java:...",
|
||||
"kind": "call",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95,
|
||||
"evidence": [
|
||||
{
|
||||
"source": "detector:java-bytecode-analyzer",
|
||||
"rule_id": "invoke-virtual",
|
||||
"rule_version": "1.0.0",
|
||||
"location": {
|
||||
"file": "com/example/Foo.class",
|
||||
"offset": 1234,
|
||||
"instruction": "invokevirtual #42"
|
||||
},
|
||||
"timestamp": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"virtual": true,
|
||||
"polymorphic_targets": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
|
||||
2. Sort JSON keys alphabetically
|
||||
3. No whitespace, UTF-8 encoding
|
||||
4. Hash is lowercase hex with `sha256:` prefix
|
||||
|
||||
### EG3: Evidence Limits/Redaction
|
||||
|
||||
**Evidence limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Evidence entries per edge | 10 | Yes |
|
||||
| Location detail fields | 5 | Yes |
|
||||
| Instruction preview length | 100 chars | Yes |
|
||||
| File path depth | 10 segments | No |
|
||||
|
||||
**Redaction rules:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
|
||||
| Bytecode offsets | Keep | Offsets are not PII |
|
||||
| Instruction text | Truncate | First 100 chars |
|
||||
| Source line content | Omit | Not included by default |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence_truncated": true,
|
||||
"evidence_count": 15,
|
||||
"evidence_shown": 10,
|
||||
"full_evidence_uri": "cas://edges/evidence/sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### EG4: Confidence Rubric
|
||||
|
||||
**Confidence scale:**
|
||||
|
||||
| Level | Range | Description | Typical Sources |
|
||||
|-------|-------|-------------|-----------------|
|
||||
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
|
||||
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
|
||||
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
|
||||
| `low` | 0.2-0.49 | Possible | Heuristic carving |
|
||||
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
|
||||
|
||||
**Confidence computation:**
|
||||
|
||||
```
|
||||
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
|
||||
```
|
||||
|
||||
**Base confidence by reason:**
|
||||
|
||||
| Reason | Base Confidence |
|
||||
|--------|-----------------|
|
||||
| `bytecode-invoke` | 0.98 |
|
||||
| `import-symbol` | 0.95 |
|
||||
| `plt-stub` | 0.92 |
|
||||
| `reloc-target` | 0.90 |
|
||||
| `init-array` | 0.95 |
|
||||
| `vtable-slot` | 0.75 |
|
||||
| `indirect-target` | 0.60 |
|
||||
| `reflection-invoke` | 0.50 |
|
||||
| `runtime-observed` | 0.99 |
|
||||
| `user-annotated` | 0.80 |
|
||||
|
||||
### EG5: Detector/Rule Provenance
|
||||
|
||||
**Provenance schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"provenance": {
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"detector": {
|
||||
"name": "java-bytecode-analyzer",
|
||||
"version": "2.0.0",
|
||||
"rule_set": "default"
|
||||
},
|
||||
"rule": {
|
||||
"id": "invoke-virtual",
|
||||
"version": "1.0.0",
|
||||
"description": "Detect invokevirtual bytecode instructions"
|
||||
},
|
||||
"input_artifacts": [
|
||||
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
|
||||
],
|
||||
"detected_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Provenance requirements:**
|
||||
|
||||
1. All edges must include analyzer provenance
|
||||
2. Detector/rule provenance required for non-runtime edges
|
||||
3. Input artifact digests enable reproducibility
|
||||
4. Detection timestamp uses UTC ISO-8601
|
||||
|
||||
### EG6: API/CLI Parity
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/api/edges/{edgeId}` | Get edge details |
|
||||
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
|
||||
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
|
||||
| `POST` | `/api/edges/search` | Search edges by criteria |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List edges for a graph
|
||||
stella edge list --graph blake3:a1b2c3d4...
|
||||
|
||||
# Get edge details
|
||||
stella edge show --id edge:sha256:...
|
||||
|
||||
# Search edges
|
||||
stella edge search --from "sym:java:..." --reason bytecode-invoke
|
||||
|
||||
# Export edges
|
||||
stella edge export --graph blake3:... --output ./edges.ndjson
|
||||
```
|
||||
|
||||
**Output parity:**
|
||||
|
||||
- API and CLI return identical JSON structure
|
||||
- CLI supports `--json` for machine-readable output
|
||||
- Both support filtering by reason, confidence, from/to
|
||||
|
||||
### EG7: Deterministic Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Edge/
|
||||
fixtures/
|
||||
bytecode-invoke.json
|
||||
plt-stub.json
|
||||
vtable-dispatch.json
|
||||
init-array-constructor.json
|
||||
runtime-observed.json
|
||||
golden/
|
||||
bytecode-invoke.golden.json
|
||||
graph-with-edges.golden.json
|
||||
|
||||
datasets/edges/
|
||||
schema/
|
||||
edge.schema.json
|
||||
reason-registry.json
|
||||
samples/
|
||||
java-spring-boot/
|
||||
edges.ndjson
|
||||
expected-hashes.txt
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each reason code has at least one fixture
|
||||
2. Fixtures include expected `edge_id` hash
|
||||
3. Golden outputs frozen after review
|
||||
4. CI verifies hash stability
|
||||
|
||||
### EG8: Propagation into Explanation Graphs/VEX
|
||||
|
||||
**Explanation graph inclusion:**
|
||||
|
||||
```json
|
||||
{
|
||||
"explanation": {
|
||||
"path": [
|
||||
{
|
||||
"node": "sym:java:main...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:handler...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.98
|
||||
}
|
||||
},
|
||||
{
|
||||
"node": "sym:java:handler...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:log4j...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
}
|
||||
}
|
||||
],
|
||||
"aggregate_path_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**VEX evidence format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"stellaops:reachability": {
|
||||
"path_edges": [
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
|
||||
],
|
||||
"weakest_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"aggregate_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EG9: Localization Guidance
|
||||
|
||||
**Localizable elements:**
|
||||
|
||||
| Element | Localization | Example |
|
||||
|---------|--------------|---------|
|
||||
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
|
||||
| Confidence level | Message catalog | `high` -> "High confidence" |
|
||||
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
|
||||
| Error messages | Message catalog | Standard error codes |
|
||||
|
||||
**Message catalog structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"locale": "en-US",
|
||||
"messages": {
|
||||
"edge.reason.bytecode-invoke": "Bytecode method call",
|
||||
"edge.reason.plt-stub": "PLT/GOT library call",
|
||||
"edge.confidence.high": "High confidence ({0:P0})",
|
||||
"edge.evidence.location": "Detected at offset {offset} in {file}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Supported locales:**
|
||||
|
||||
- `en-US` (default)
|
||||
- Additional locales via contribution
|
||||
|
||||
### EG10: Backfill Plan
|
||||
|
||||
**Backfill strategy:**
|
||||
|
||||
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
|
||||
2. **Phase 2:** Run detector upgrade on graphs without reason codes
|
||||
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
|
||||
|
||||
**Migration script:**
|
||||
|
||||
```bash
|
||||
stella edge backfill --graph blake3:... --dry-run
|
||||
|
||||
# Output:
|
||||
Graph: blake3:a1b2c3d4...
|
||||
Edges without reason: 1234
|
||||
Edges to update: 1234
|
||||
|
||||
Dry run - no changes made.
|
||||
|
||||
# Execute:
|
||||
stella edge backfill --graph blake3:... --execute
|
||||
```
|
||||
|
||||
**Backfill metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"backfill": {
|
||||
"status": "complete",
|
||||
"original_analyzer_version": "1.0.0",
|
||||
"backfill_analyzer_version": "1.2.0",
|
||||
"backfilled_at": "2025-12-13T10:00:00Z",
|
||||
"edges_updated": 1234
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Explainability Schema](./explainability-schema.md) - Explanation format
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._
|
||||
Reference in New Issue
Block a user