up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-12-13 09:37:15 +02:00
parent e00f6365da
commit 6e45066e37
349 changed files with 17160 additions and 1867 deletions

View File

@@ -0,0 +1,416 @@
# Edge Explainability Schema
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
---
## 1. Overview
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
- **Confidence score:** Certainty of the edge's existence
- **Evidence sources:** Detectors and rules that contributed to edge discovery
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
---
## 2. Gap Resolutions
### EG1: Reason Enum Governance
**Standard reason codes:**
| Code | Category | Description | Example |
|------|----------|-------------|---------|
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
| `bytecode-field` | Static | Field access leading to call | Static initializer |
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
| `user-annotated` | Manual | User-provided edge | Policy override |
**Governance rules:**
1. New reason codes require RFC + review by Scanner Guild
2. Deprecated codes remain valid for 2 major versions
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
4. Codes are case-insensitive, normalized to lowercase
**Code registry:**
```json
{
"schema": "stellaops.edge.reason.registry@v1",
"version": "2025-12-13",
"reasons": [
{
"code": "bytecode-invoke",
"category": "static",
"description": "Bytecode invocation instruction",
"languages": ["java", "dotnet"],
"confidence_range": [0.9, 1.0],
"deprecated": false
}
]
}
```
### EG2: Canonical Edge Schema with Hash Rules
**Edge schema:**
```json
{
"edge_id": "edge:sha256:{hex}",
"from": "sym:java:...",
"to": "sym:java:...",
"kind": "call",
"reason": "bytecode-invoke",
"confidence": 0.95,
"evidence": [
{
"source": "detector:java-bytecode-analyzer",
"rule_id": "invoke-virtual",
"rule_version": "1.0.0",
"location": {
"file": "com/example/Foo.class",
"offset": 1234,
"instruction": "invokevirtual #42"
},
"timestamp": "2025-12-13T10:00:00Z"
}
],
"attributes": {
"virtual": true,
"polymorphic_targets": 3
}
}
```
**Hash computation:**
```
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason
})
)
```
**Canonicalization:**
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
2. Sort JSON keys alphabetically
3. No whitespace, UTF-8 encoding
4. Hash is lowercase hex with `sha256:` prefix
### EG3: Evidence Limits/Redaction
**Evidence limits:**
| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Evidence entries per edge | 10 | Yes |
| Location detail fields | 5 | Yes |
| Instruction preview length | 100 chars | Yes |
| File path depth | 10 segments | No |
**Redaction rules:**
| Category | Redaction | Example |
|----------|-----------|---------|
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
| Bytecode offsets | Keep | Offsets are not PII |
| Instruction text | Truncate | First 100 chars |
| Source line content | Omit | Not included by default |
**Truncation behavior:**
```json
{
"evidence_truncated": true,
"evidence_count": 15,
"evidence_shown": 10,
"full_evidence_uri": "cas://edges/evidence/sha256:..."
}
```
### EG4: Confidence Rubric
**Confidence scale:**
| Level | Range | Description | Typical Sources |
|-------|-------|-------------|-----------------|
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
| `low` | 0.2-0.49 | Possible | Heuristic carving |
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
**Confidence computation:**
```
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
```
**Base confidence by reason:**
| Reason | Base Confidence |
|--------|-----------------|
| `bytecode-invoke` | 0.98 |
| `import-symbol` | 0.95 |
| `plt-stub` | 0.92 |
| `reloc-target` | 0.90 |
| `init-array` | 0.95 |
| `vtable-slot` | 0.75 |
| `indirect-target` | 0.60 |
| `reflection-invoke` | 0.50 |
| `runtime-observed` | 0.99 |
| `user-annotated` | 0.80 |
### EG5: Detector/Rule Provenance
**Provenance schema:**
```json
{
"provenance": {
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"digest": "sha256:..."
},
"detector": {
"name": "java-bytecode-analyzer",
"version": "2.0.0",
"rule_set": "default"
},
"rule": {
"id": "invoke-virtual",
"version": "1.0.0",
"description": "Detect invokevirtual bytecode instructions"
},
"input_artifacts": [
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
],
"detected_at": "2025-12-13T10:00:00Z"
}
}
```
**Provenance requirements:**
1. All edges must include analyzer provenance
2. Detector/rule provenance required for non-runtime edges
3. Input artifact digests enable reproducibility
4. Detection timestamp uses UTC ISO-8601
### EG6: API/CLI Parity
**API endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/edges/{edgeId}` | Get edge details |
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
| `POST` | `/api/edges/search` | Search edges by criteria |
**CLI commands:**
```bash
# List edges for a graph
stella edge list --graph blake3:a1b2c3d4...
# Get edge details
stella edge show --id edge:sha256:...
# Search edges
stella edge search --from "sym:java:..." --reason bytecode-invoke
# Export edges
stella edge export --graph blake3:... --output ./edges.ndjson
```
**Output parity:**
- API and CLI return identical JSON structure
- CLI supports `--json` for machine-readable output
- Both support filtering by reason, confidence, from/to
### EG7: Deterministic Fixtures
**Fixture location:**
```
tests/Edge/
fixtures/
bytecode-invoke.json
plt-stub.json
vtable-dispatch.json
init-array-constructor.json
runtime-observed.json
golden/
bytecode-invoke.golden.json
graph-with-edges.golden.json
datasets/edges/
schema/
edge.schema.json
reason-registry.json
samples/
java-spring-boot/
edges.ndjson
expected-hashes.txt
```
**Fixture requirements:**
1. Each reason code has at least one fixture
2. Fixtures include expected `edge_id` hash
3. Golden outputs frozen after review
4. CI verifies hash stability
### EG8: Propagation into Explanation Graphs/VEX
**Explanation graph inclusion:**
```json
{
"explanation": {
"path": [
{
"node": "sym:java:main...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:handler...",
"reason": "bytecode-invoke",
"confidence": 0.98
}
},
{
"node": "sym:java:handler...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:log4j...",
"reason": "bytecode-invoke",
"confidence": 0.95
}
}
],
"aggregate_path_confidence": 0.93
}
}
```
**VEX evidence format:**
```json
{
"stellaops:reachability": {
"path_edges": [
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
],
"weakest_edge": {
"edge_id": "edge:sha256:...",
"reason": "bytecode-invoke",
"confidence": 0.95
},
"aggregate_confidence": 0.93
}
}
```
### EG9: Localization Guidance
**Localizable elements:**
| Element | Localization | Example |
|---------|--------------|---------|
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
| Confidence level | Message catalog | `high` -> "High confidence" |
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
| Error messages | Message catalog | Standard error codes |
**Message catalog structure:**
```json
{
"locale": "en-US",
"messages": {
"edge.reason.bytecode-invoke": "Bytecode method call",
"edge.reason.plt-stub": "PLT/GOT library call",
"edge.confidence.high": "High confidence ({0:P0})",
"edge.evidence.location": "Detected at offset {offset} in {file}"
}
}
```
**Supported locales:**
- `en-US` (default)
- Additional locales via contribution
### EG10: Backfill Plan
**Backfill strategy:**
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
2. **Phase 2:** Run detector upgrade on graphs without reason codes
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
**Migration script:**
```bash
stella edge backfill --graph blake3:... --dry-run
# Output:
Graph: blake3:a1b2c3d4...
Edges without reason: 1234
Edges to update: 1234
Dry run - no changes made.
# Execute:
stella edge backfill --graph blake3:... --execute
```
**Backfill metadata:**
```json
{
"backfill": {
"status": "complete",
"original_analyzer_version": "1.0.0",
"backfill_analyzer_version": "1.2.0",
"backfilled_at": "2025-12-13T10:00:00Z",
"edges_updated": 1234
}
}
```
---
## 3. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Explainability Schema](./explainability-schema.md) - Explanation format
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
---
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._