git.stella-ops.org/docs/reachability/explainability-schema.md

# Explainability Schema

_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._

This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.

---

## 1. Overview

Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:

- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
- **Confidence scores:** Per-rule and aggregate confidence values
- **Redaction metadata:** PII handling and data classification

---

## 2. Gap Resolutions

### EX1: Schema/Canonicalization + Hashes

**Explanation schema:**

```json
{
  "schema": "stellaops.explanation@v1",
  "explanation_id": "explain:sha256:{hex}",
  "finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
  "verdict": {
    "status": "affected",
    "severity": {"normalized": "Critical", "score": 10.0},
    "confidence": 0.92
  },
  "decision_chain": [
    {
      "rule_id": "rule:reachability_gate",
      "rule_version": "1.0.0",
      "inputs": {
        "reachability.state": "CR",
        "reachability.confidence": 0.92
      },
      "output": {"allowed": true, "contribution": 0.4},
      "evidence_refs": ["cas://reachability/graphs/blake3:..."]
    },
    {
      "rule_id": "rule:severity_baseline",
      "rule_version": "1.0.0",
      "inputs": {
        "cvss_base": 10.0,
        "epss_percentile": 0.95
      },
      "output": {"severity": "Critical", "contribution": 0.6},
      "evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
    }
  ],
  "aggregate_confidence": 0.88,
  "created_at": "2025-12-13T10:00:00Z",
  "policy_version": "sha256:...",
  "graph_revision_id": "rev:blake3:..."
}
```

**Canonicalization rules:**

1. JSON keys sorted alphabetically at all levels
2. Arrays in `decision_chain` ordered by rule execution sequence
3. `evidence_refs` arrays sorted alphabetically
4. No whitespace, UTF-8 encoding
5. Hash computed over canonical JSON: `sha256(canonical_json)`

### EX2: DSSE Predicate/Signing Policy

**DSSE predicate type:**

```
stella.ops/explanation@v1
```

**Signing policy:**

| Element | Required | Signer |
|---------|----------|--------|
| Explanation body | Yes | Policy Engine key |
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |

**DSSE envelope structure:**

```json
{
  "payloadType": "application/vnd.stellaops.explanation+json",
  "payload": "<base64(canonical_explanation_json)>",
  "signatures": [
    {
      "keyid": "policy-engine-signing-2025",
      "sig": "base64:..."
    }
  ]
}
```

**Signing requirements:**

- All explanations must be signed before CAS storage
- Signing key must be registered in Authority key store
- Key rotation triggers re-signing of active explanations (configurable)

### EX3: CAS Storage Rules for Evidence

**Storage layout:**

```
cas://explanations/
  {sha256}/                      # Explanation body
  {sha256}.dsse                  # DSSE envelope
  by-finding/{finding_id}/       # Index by finding
  by-policy/{policy_digest}/     # Index by policy version
  by-graph/{graph_revision_id}/  # Index by graph revision
```

**Storage rules:**

1. Explanations are immutable after signing
2. New verdicts create new explanation documents (no updates)
3. Previous explanations are retained per retention policy
4. Cross-references validated at write time (graphs, VEX must exist)

**Deduplication:**

- Identical canonical JSON produces identical hash
- CAS returns existing reference if content matches

### EX4: Link to Decision/Policy and graph_revision_id

**Required links:**

```json
{
  "links": {
    "policy_version": "sha256:7e1d...",
    "policy_uri": "cas://policy/versions/sha256:7e1d...",
    "graph_revision_id": "rev:blake3:a1b2...",
    "graph_uri": "cas://reachability/revisions/blake3:a1b2...",
    "sbom_digest": "sha256:def4...",
    "sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
    "vex_digest": "sha256:e5f6...",
    "vex_uri": "cas://excititor/vex/openvex.json"
  }
}
```

**Validation:**

- All linked artifacts must exist at explanation creation time
- Links are verified during replay/audit
- Broken links cause replay verification failure

### EX5: Export/Replay Bundle Format

**Export bundle manifest:**

```json
{
  "schema": "stellaops.explanation.bundle@v1",
  "bundle_id": "bundle:explain:2025-12-13",
  "created_at": "2025-12-13T10:00:00Z",
  "explanations": [
    {
      "explanation_id": "explain:sha256:...",
      "finding_id": "...",
      "explanation_uri": "explanations/sha256:....json",
      "dsse_uri": "explanations/sha256:....dsse"
    }
  ],
  "dependencies": {
    "graphs": [
      {"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
    ],
    "policies": [
      {"digest": "sha256:...", "uri": "policies/sha256:....json"}
    ],
    "vex_statements": [
      {"digest": "sha256:...", "uri": "vex/sha256:....json"}
    ]
  },
  "verification": {
    "bundle_hash": "sha256:...",
    "signature": "base64:...",
    "signed_by": "policy-engine-signing-2025"
  }
}
```

**Replay verification:**

```bash
stella explain verify --bundle ./explanation-bundle.tgz

# Output:
Bundle: bundle:explain:2025-12-13
Explanations: 42
Dependencies: 5 graphs, 2 policies, 12 VEX

Verifying explanations...
  Canonical hashes: 42/42 MATCH
  DSSE signatures: 42/42 VALID
  Dependency links: 42/42 RESOLVED

Replay verification PASSED.
```

### EX6: PII/Redaction Rules

**Redaction categories:**

| Category | Redaction | Example |
|----------|-----------|---------|
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |

**Redaction metadata:**

```json
{
  "redaction": {
    "applied": true,
    "level": "standard",
    "fields_redacted": ["actor.email", "evidence.file_path"],
    "redaction_policy": "stellaops.redaction.standard@v1"
  }
}
```

**Export modes:**

- `--redacted` (default): Apply standard redaction
- `--full`: Include all data (requires `explain:export:full` scope)
- `--audit`: Include redaction audit trail

### EX7: Size Budgets

**Limits:**

| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Explanation body | 256 KB | Yes |
| Decision chain entries | 100 | Yes |
| Evidence refs per rule | 20 | Yes |
| Total evidence refs | 200 | Yes |
| Path entries | 50 | No |

**Truncation behavior:**

When limits are exceeded:
1. Log warning with truncation details
2. Add `truncation` metadata to explanation
3. Store full evidence in separate CAS object
4. Include `full_evidence_uri` reference

```json
{
  "truncation": {
    "applied": true,
    "elements_truncated": ["decision_chain", "evidence_refs"],
    "full_evidence_uri": "cas://explanations/full/sha256:..."
  }
}
```

### EX8: Versioning

**Schema versioning:**

- Schema version in `schema` field: `stellaops.explanation@v1`
- Breaking changes increment major version
- Minor changes (additive fields) use v1.x
- Backward compatibility maintained for 2 major versions

**Migration support:**

```bash
stella explain migrate --from v1 --to v2 --input ./explanations/

# Output:
Migrating 1000 explanations from v1 to v2...
  Migrated: 998
  Skipped (already v2): 2

Migration complete.
```

**Version compatibility matrix:**

| API Version | Schema v1 | Schema v2 |
|-------------|-----------|-----------|
| 1.0.x | Full | N/A |
| 1.1.x | Full | Full |
| 2.0.x | Read-only | Full |

### EX9: Golden Fixtures/Tests

**Test fixture location:**

```
tests/Explanation/
  fixtures/
    simple-affected.json
    simple-not-affected.json
    with-reachability-evidence.json
    multi-rule-chain.json
    truncated-evidence.json
    redacted-pii.json
  golden/
    simple-affected.golden.json
    simple-affected.golden.dsse

datasets/explanations/
  schema/
    explanation.schema.json
  samples/
    log4j-affected/
      explanation.json
      expected-hash.txt
```

**Test categories:**

1. **Canonicalization tests:** Verify hash stability across JSON reordering
2. **DSSE signing tests:** Verify signature creation and verification
3. **Redaction tests:** Verify PII handling
4. **Truncation tests:** Verify size budget enforcement
5. **Replay tests:** Verify bundle export/import cycle
6. **Migration tests:** Verify version upgrade paths

**CI integration:**

```yaml
# .gitea/workflows/explanation-tests.yml
explanation-tests:
  runs-on: ubuntu-latest
  steps:
    - name: Run explanation tests
      run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
    - name: Verify golden fixtures
      run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
```

### EX10: Determinism Guarantees

**Determinism requirements:**

1. Same inputs produce identical `explanation_id` hash
2. Decision chain ordering is stable (execution order)
3. Evidence refs sorted alphabetically
4. Timestamps use UTC ISO-8601 with millisecond precision
5. Floating-point values rounded to 6 decimal places

**Verification:**

```bash
# Run twice with same inputs, verify identical hashes
stella explain generate --finding "..." --output a.json
stella explain generate --finding "..." --output b.json
diff a.json b.json  # Should be empty

# Or use built-in verify
stella explain verify-determinism --finding "..." --iterations 3
```

---

## 3. API Reference

### 3.1 Generate Explanation

```http
POST /api/policy/findings/{findingId}/explain
Authorization: Bearer <token>
Content-Type: application/json

{
  "mode": "full",
  "include_evidence": true,
  "redaction_level": "standard"
}
```

### 3.2 Get Explanation

```http
GET /api/explanations/{explanationId}
Authorization: Bearer <token>
Accept: application/json
```

### 3.3 Export Explanation Bundle

```http
POST /api/explanations/export
Authorization: Bearer <token>
Content-Type: application/json

{
  "finding_ids": ["...", "..."],
  "include_dependencies": true,
  "redaction_level": "standard"
}
```

### 3.4 Verify Explanation

```http
POST /api/explanations/{explanationId}/verify
Authorization: Bearer <token>
```

---

## 4. CLI Reference

```bash
# Generate explanation for a finding
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"

# Export explanation bundle
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz

# Verify explanation
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse

# Verify bundle
stella explain verify --bundle ./bundle.tgz

# Check determinism
stella explain verify-determinism --finding "..." --iterations 5
```

---

## 5. Related Documentation

- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
- [Policy API](../api/policy.md) - Policy Engine REST API
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications

---

_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._