docs consolidation and others
This commit is contained in:
461
docs/modules/reach-graph/schemas/binary-reachability-schema.md
Normal file
461
docs/modules/reach-graph/schemas/binary-reachability-schema.md
Normal file
@@ -0,0 +1,461 @@
|
||||
# Binary Reachability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
|
||||
|
||||
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
|
||||
|
||||
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
|
||||
- **Build variants:** Handling multiple builds from same source
|
||||
- **Large graphs:** Chunking and size limits for DSSE/Rekor
|
||||
- **Offline verification:** Air-gapped attestation workflows
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### BR1: Canonical DSSE/Predicate Schemas
|
||||
|
||||
**Binary graph predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryGraph@v1
|
||||
```
|
||||
|
||||
**Predicate schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "graph",
|
||||
"digest": {"blake3": "a1b2c3d4e5f6..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"analyzer": {
|
||||
"name": "scanner.native",
|
||||
"version": "1.2.0",
|
||||
"toolchain": "ghidra-11.2"
|
||||
},
|
||||
"binary": {
|
||||
"format": "ELF",
|
||||
"arch": "x86_64",
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c..."
|
||||
},
|
||||
"graph_stats": {
|
||||
"node_count": 1247,
|
||||
"edge_count": 3891,
|
||||
"root_count": 5
|
||||
},
|
||||
"evidence": {
|
||||
"symbols_source": "DWARF",
|
||||
"stripped_symbols": 58,
|
||||
"heuristic_symbols": 12
|
||||
},
|
||||
"created_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Edge bundle predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryEdgeBundle@v1
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "edges",
|
||||
"digest": {"sha256": "..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"bundle_id": "bundle:001",
|
||||
"bundle_reason": "init_array",
|
||||
"edge_count": 128,
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:binary:...",
|
||||
"to": "sym:binary:...",
|
||||
"reason": "init-array",
|
||||
"confidence": 0.95
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR2: Edge Hash Recipe
|
||||
|
||||
**Binary edge hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason,
|
||||
"binary_hash": binary.file_hash // Binary context included
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Hash includes binary context:**
|
||||
|
||||
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
|
||||
2. No whitespace, UTF-8 encoding
|
||||
3. Lowercase hex for all hashes
|
||||
|
||||
### BR3: Required Binary Evidence with CAS Refs
|
||||
|
||||
**Required evidence per node:**
|
||||
|
||||
| Evidence Type | Required | CAS Storage |
|
||||
|---------------|----------|-------------|
|
||||
| File hash | Yes | N/A (inline) |
|
||||
| Build ID | Conditional | N/A (inline) |
|
||||
| Symbol source | Yes | N/A (inline) |
|
||||
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
|
||||
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
|
||||
| CFG | Optional | `cas://binary/cfg/{sha256}` |
|
||||
|
||||
**Evidence schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"binary_evidence": {
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_source": "DWARF",
|
||||
"symbol_confidence": 0.95,
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
|
||||
"disassembly_uri": "cas://binary/disasm/sha256:...",
|
||||
"cfg_uri": "cas://binary/cfg/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**CAS layout:**
|
||||
|
||||
```
|
||||
cas://binary/
|
||||
blocks/{sha256}/ # Code block bytes
|
||||
disasm/{sha256}/ # Disassembly JSON
|
||||
cfg/{sha256}/ # Control flow graph
|
||||
symbols/{sha256}/ # Symbol table extract
|
||||
```
|
||||
|
||||
### BR4: Build-ID/Variant Rules
|
||||
|
||||
**Build-ID sources:**
|
||||
|
||||
| Format | Build-ID Source | Example |
|
||||
|--------|-----------------|---------|
|
||||
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
|
||||
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
|
||||
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
|
||||
|
||||
**Fallback when build-ID absent:**
|
||||
|
||||
```json
|
||||
{
|
||||
"build_id": null,
|
||||
"build_id_fallback": {
|
||||
"method": "file_hash",
|
||||
"value": "sha256:...",
|
||||
"confidence": 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variant handling:**
|
||||
|
||||
Multiple binaries from same source (debug/release, different arch):
|
||||
|
||||
```json
|
||||
{
|
||||
"variant_group": "sha256:source_hash...",
|
||||
"variants": [
|
||||
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
|
||||
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
|
||||
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### BR5: Policy Hash Governance
|
||||
|
||||
**Policy version binding:**
|
||||
|
||||
Binary reachability graphs are bound to a policy version:
|
||||
|
||||
```json
|
||||
{
|
||||
"policy_binding": {
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"bound_at": "2025-12-13T10:00:00Z",
|
||||
"binding_mode": "strict"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Binding modes:**
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `strict` | Graph invalid if policy changes |
|
||||
| `forward` | Graph valid with newer policy versions |
|
||||
| `any` | Graph valid with any policy version |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. Production graphs use `strict` binding
|
||||
2. Test graphs may use `forward`
|
||||
3. Policy hash computed from canonical DSL
|
||||
4. Binding stored in graph metadata
|
||||
|
||||
### BR6: Sigstore Bundle/Log Routing
|
||||
|
||||
**Sigstore integration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"bundle_type": "hashedrekord",
|
||||
"log_index": 12345678,
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"inclusion_proof": {
|
||||
"log_index": 12345678,
|
||||
"root_hash": "sha256:...",
|
||||
"tree_size": 98765432,
|
||||
"hashes": ["sha256:...", "sha256:..."]
|
||||
},
|
||||
"signed_entry_timestamp": "base64:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Log routing:**
|
||||
|
||||
| Evidence Type | Log | Notes |
|
||||
|---------------|-----|-------|
|
||||
| Graph DSSE | Rekor (public) | Always |
|
||||
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
|
||||
| Code block | No log | CAS only |
|
||||
| CFG/Disasm | No log | CAS only |
|
||||
|
||||
**Offline mode:**
|
||||
|
||||
When Rekor unavailable:
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"mode": "offline",
|
||||
"checkpoint": {
|
||||
"origin": "rekor.sigstore.dev",
|
||||
"checkpoint_data": "base64:...",
|
||||
"captured_at": "2025-12-13T10:00:00Z"
|
||||
},
|
||||
"deferred_submission": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR7: Idempotent Submission Keys
|
||||
|
||||
**Submission key format:**
|
||||
|
||||
```
|
||||
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
|
||||
```
|
||||
|
||||
**Idempotency rules:**
|
||||
|
||||
1. Same key returns existing entry (no duplicate)
|
||||
2. Key includes hour-granularity timestamp for rate limiting
|
||||
3. Different graphs from same binary produce different keys
|
||||
4. Retry within 1 hour uses same key
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```json
|
||||
{
|
||||
"submission": {
|
||||
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
|
||||
"status": "accepted",
|
||||
"existing_entry": false,
|
||||
"log_index": 12345678
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR8: Size/Chunking Limits
|
||||
|
||||
**Size limits:**
|
||||
|
||||
| Element | Limit | Action on Exceed |
|
||||
|---------|-------|------------------|
|
||||
| Graph JSON | 10 MB | Chunk nodes/edges |
|
||||
| Edge bundle | 512 edges | Split bundles |
|
||||
| DSSE payload | 1 MB | Compress/chunk |
|
||||
| Rekor entry | 100 KB | Reference CAS |
|
||||
|
||||
**Chunking strategy:**
|
||||
|
||||
For large graphs (>10MB):
|
||||
|
||||
```json
|
||||
{
|
||||
"chunked_graph": {
|
||||
"chunk_count": 5,
|
||||
"chunks": [
|
||||
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
|
||||
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
|
||||
],
|
||||
"assembly_order": ["chunk:001", "chunk:002", ...],
|
||||
"assembled_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Compression:**
|
||||
|
||||
- Graph JSON: gzip before DSSE
|
||||
- CAS storage: Raw JSON (indexed)
|
||||
- Rekor payload: DSSE references CAS
|
||||
|
||||
### BR9: API/CLI/UI Surfacing
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `POST` | `/api/binary/graphs` | Submit binary graph |
|
||||
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
|
||||
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
|
||||
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
|
||||
| `POST` | `/api/binary/verify` | Verify graph attestation |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# Submit binary graph
|
||||
stella binary submit --graph ./richgraph.json --binary ./app
|
||||
|
||||
# Get graph info
|
||||
stella binary info --hash blake3:a1b2c3d4...
|
||||
|
||||
# List symbols
|
||||
stella binary symbols --hash blake3:... --stripped-only
|
||||
|
||||
# Verify attestation
|
||||
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
|
||||
```
|
||||
|
||||
**UI components:**
|
||||
|
||||
- Binary graph visualization with zoom/pan
|
||||
- Symbol table with search/filter
|
||||
- Edge explorer with confidence highlighting
|
||||
- Attestation status badges
|
||||
- Build variant selector
|
||||
|
||||
### BR10: Binary Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Binary/
|
||||
fixtures/
|
||||
elf-x86_64-with-debug/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
elf-stripped/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
pe-x64-with-pdb/
|
||||
binary.exe
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
golden/
|
||||
elf-x86_64.golden.json
|
||||
pe-x64.golden.json
|
||||
|
||||
datasets/binary/
|
||||
schema/
|
||||
binary-graph.schema.json
|
||||
binary-edge.schema.json
|
||||
samples/
|
||||
openssl-1.1.1/
|
||||
libssl.so
|
||||
graph.json
|
||||
edges.ndjson
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each binary format has at least one fixture
|
||||
2. Stripped and debug variants for each format
|
||||
3. Expected hashes verified by CI
|
||||
4. Golden outputs include DSSE envelopes
|
||||
5. Fixtures reproducible from source (where legal)
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Hash stability:** Same binary produces same graph hash
|
||||
2. **Build-ID extraction:** Correct build-ID parsing per format
|
||||
3. **Symbol recovery:** DWARF/PDB parsing accuracy
|
||||
4. **Stripped handling:** Code block hash computation
|
||||
5. **Chunking:** Large graph assembly/disassembly
|
||||
6. **DSSE signing:** Envelope creation and verification
|
||||
7. **Rekor integration:** Submission and verification
|
||||
|
||||
---
|
||||
|
||||
## 3. Implementation Status
|
||||
|
||||
| Component | Location | Status |
|
||||
|-----------|----------|--------|
|
||||
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
|
||||
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
|
||||
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
|
||||
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
|
||||
| UI components | `src/Web/StellaOps.Web` | Implemented |
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._
|
||||
416
docs/modules/reach-graph/schemas/edge-explainability-schema.md
Normal file
416
docs/modules/reach-graph/schemas/edge-explainability-schema.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Edge Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
|
||||
|
||||
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
|
||||
|
||||
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
|
||||
- **Confidence score:** Certainty of the edge's existence
|
||||
- **Evidence sources:** Detectors and rules that contributed to edge discovery
|
||||
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EG1: Reason Enum Governance
|
||||
|
||||
**Standard reason codes:**
|
||||
|
||||
| Code | Category | Description | Example |
|
||||
|------|----------|-------------|---------|
|
||||
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
|
||||
| `bytecode-field` | Static | Field access leading to call | Static initializer |
|
||||
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
|
||||
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
|
||||
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
|
||||
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
|
||||
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
|
||||
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
|
||||
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
|
||||
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
|
||||
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
|
||||
| `user-annotated` | Manual | User-provided edge | Policy override |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. New reason codes require RFC + review by Scanner Guild
|
||||
2. Deprecated codes remain valid for 2 major versions
|
||||
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
|
||||
4. Codes are case-insensitive, normalized to lowercase
|
||||
|
||||
**Code registry:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.edge.reason.registry@v1",
|
||||
"version": "2025-12-13",
|
||||
"reasons": [
|
||||
{
|
||||
"code": "bytecode-invoke",
|
||||
"category": "static",
|
||||
"description": "Bytecode invocation instruction",
|
||||
"languages": ["java", "dotnet"],
|
||||
"confidence_range": [0.9, 1.0],
|
||||
"deprecated": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### EG2: Canonical Edge Schema with Hash Rules
|
||||
|
||||
**Edge schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"edge_id": "edge:sha256:{hex}",
|
||||
"from": "sym:java:...",
|
||||
"to": "sym:java:...",
|
||||
"kind": "call",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95,
|
||||
"evidence": [
|
||||
{
|
||||
"source": "detector:java-bytecode-analyzer",
|
||||
"rule_id": "invoke-virtual",
|
||||
"rule_version": "1.0.0",
|
||||
"location": {
|
||||
"file": "com/example/Foo.class",
|
||||
"offset": 1234,
|
||||
"instruction": "invokevirtual #42"
|
||||
},
|
||||
"timestamp": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"virtual": true,
|
||||
"polymorphic_targets": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
|
||||
2. Sort JSON keys alphabetically
|
||||
3. No whitespace, UTF-8 encoding
|
||||
4. Hash is lowercase hex with `sha256:` prefix
|
||||
|
||||
### EG3: Evidence Limits/Redaction
|
||||
|
||||
**Evidence limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Evidence entries per edge | 10 | Yes |
|
||||
| Location detail fields | 5 | Yes |
|
||||
| Instruction preview length | 100 chars | Yes |
|
||||
| File path depth | 10 segments | No |
|
||||
|
||||
**Redaction rules:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
|
||||
| Bytecode offsets | Keep | Offsets are not PII |
|
||||
| Instruction text | Truncate | First 100 chars |
|
||||
| Source line content | Omit | Not included by default |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence_truncated": true,
|
||||
"evidence_count": 15,
|
||||
"evidence_shown": 10,
|
||||
"full_evidence_uri": "cas://edges/evidence/sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### EG4: Confidence Rubric
|
||||
|
||||
**Confidence scale:**
|
||||
|
||||
| Level | Range | Description | Typical Sources |
|
||||
|-------|-------|-------------|-----------------|
|
||||
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
|
||||
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
|
||||
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
|
||||
| `low` | 0.2-0.49 | Possible | Heuristic carving |
|
||||
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
|
||||
|
||||
**Confidence computation:**
|
||||
|
||||
```
|
||||
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
|
||||
```
|
||||
|
||||
**Base confidence by reason:**
|
||||
|
||||
| Reason | Base Confidence |
|
||||
|--------|-----------------|
|
||||
| `bytecode-invoke` | 0.98 |
|
||||
| `import-symbol` | 0.95 |
|
||||
| `plt-stub` | 0.92 |
|
||||
| `reloc-target` | 0.90 |
|
||||
| `init-array` | 0.95 |
|
||||
| `vtable-slot` | 0.75 |
|
||||
| `indirect-target` | 0.60 |
|
||||
| `reflection-invoke` | 0.50 |
|
||||
| `runtime-observed` | 0.99 |
|
||||
| `user-annotated` | 0.80 |
|
||||
|
||||
### EG5: Detector/Rule Provenance
|
||||
|
||||
**Provenance schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"provenance": {
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"detector": {
|
||||
"name": "java-bytecode-analyzer",
|
||||
"version": "2.0.0",
|
||||
"rule_set": "default"
|
||||
},
|
||||
"rule": {
|
||||
"id": "invoke-virtual",
|
||||
"version": "1.0.0",
|
||||
"description": "Detect invokevirtual bytecode instructions"
|
||||
},
|
||||
"input_artifacts": [
|
||||
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
|
||||
],
|
||||
"detected_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Provenance requirements:**
|
||||
|
||||
1. All edges must include analyzer provenance
|
||||
2. Detector/rule provenance required for non-runtime edges
|
||||
3. Input artifact digests enable reproducibility
|
||||
4. Detection timestamp uses UTC ISO-8601
|
||||
|
||||
### EG6: API/CLI Parity
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/api/edges/{edgeId}` | Get edge details |
|
||||
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
|
||||
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
|
||||
| `POST` | `/api/edges/search` | Search edges by criteria |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List edges for a graph
|
||||
stella edge list --graph blake3:a1b2c3d4...
|
||||
|
||||
# Get edge details
|
||||
stella edge show --id edge:sha256:...
|
||||
|
||||
# Search edges
|
||||
stella edge search --from "sym:java:..." --reason bytecode-invoke
|
||||
|
||||
# Export edges
|
||||
stella edge export --graph blake3:... --output ./edges.ndjson
|
||||
```
|
||||
|
||||
**Output parity:**
|
||||
|
||||
- API and CLI return identical JSON structure
|
||||
- CLI supports `--json` for machine-readable output
|
||||
- Both support filtering by reason, confidence, from/to
|
||||
|
||||
### EG7: Deterministic Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Edge/
|
||||
fixtures/
|
||||
bytecode-invoke.json
|
||||
plt-stub.json
|
||||
vtable-dispatch.json
|
||||
init-array-constructor.json
|
||||
runtime-observed.json
|
||||
golden/
|
||||
bytecode-invoke.golden.json
|
||||
graph-with-edges.golden.json
|
||||
|
||||
datasets/edges/
|
||||
schema/
|
||||
edge.schema.json
|
||||
reason-registry.json
|
||||
samples/
|
||||
java-spring-boot/
|
||||
edges.ndjson
|
||||
expected-hashes.txt
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each reason code has at least one fixture
|
||||
2. Fixtures include expected `edge_id` hash
|
||||
3. Golden outputs frozen after review
|
||||
4. CI verifies hash stability
|
||||
|
||||
### EG8: Propagation into Explanation Graphs/VEX
|
||||
|
||||
**Explanation graph inclusion:**
|
||||
|
||||
```json
|
||||
{
|
||||
"explanation": {
|
||||
"path": [
|
||||
{
|
||||
"node": "sym:java:main...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:handler...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.98
|
||||
}
|
||||
},
|
||||
{
|
||||
"node": "sym:java:handler...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:log4j...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
}
|
||||
}
|
||||
],
|
||||
"aggregate_path_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**VEX evidence format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"stellaops:reachability": {
|
||||
"path_edges": [
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
|
||||
],
|
||||
"weakest_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"aggregate_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EG9: Localization Guidance
|
||||
|
||||
**Localizable elements:**
|
||||
|
||||
| Element | Localization | Example |
|
||||
|---------|--------------|---------|
|
||||
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
|
||||
| Confidence level | Message catalog | `high` -> "High confidence" |
|
||||
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
|
||||
| Error messages | Message catalog | Standard error codes |
|
||||
|
||||
**Message catalog structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"locale": "en-US",
|
||||
"messages": {
|
||||
"edge.reason.bytecode-invoke": "Bytecode method call",
|
||||
"edge.reason.plt-stub": "PLT/GOT library call",
|
||||
"edge.confidence.high": "High confidence ({0:P0})",
|
||||
"edge.evidence.location": "Detected at offset {offset} in {file}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Supported locales:**
|
||||
|
||||
- `en-US` (default)
|
||||
- Additional locales via contribution
|
||||
|
||||
### EG10: Backfill Plan
|
||||
|
||||
**Backfill strategy:**
|
||||
|
||||
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
|
||||
2. **Phase 2:** Run detector upgrade on graphs without reason codes
|
||||
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
|
||||
|
||||
**Migration script:**
|
||||
|
||||
```bash
|
||||
stella edge backfill --graph blake3:... --dry-run
|
||||
|
||||
# Output:
|
||||
Graph: blake3:a1b2c3d4...
|
||||
Edges without reason: 1234
|
||||
Edges to update: 1234
|
||||
|
||||
Dry run - no changes made.
|
||||
|
||||
# Execute:
|
||||
stella edge backfill --graph blake3:... --execute
|
||||
```
|
||||
|
||||
**Backfill metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"backfill": {
|
||||
"status": "complete",
|
||||
"original_analyzer_version": "1.0.0",
|
||||
"backfill_analyzer_version": "1.2.0",
|
||||
"backfilled_at": "2025-12-13T10:00:00Z",
|
||||
"edges_updated": 1234
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Explainability Schema](./explainability-schema.md) - Explanation format
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._
|
||||
101
docs/modules/reach-graph/schemas/evidence-schema.md
Normal file
101
docs/modules/reach-graph/schemas/evidence-schema.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Reachability Evidence Schema (Draft v1, Nov 2026)
|
||||
|
||||
Purpose: define the canonical fields for reachability graph nodes/edges, runtime facts, and unknowns so Scanner, Signals, Policy, Replay, CLI/UI, and SbomService stay aligned. This replaces scattered notes in advisories.
|
||||
|
||||
## 1. Core identifiers
|
||||
|
||||
- `symbol_id`: canonical ID for a function/symbol; includes `{format, build_id?, file_hash?, section?, addr, length}` plus optional `code_block_hash`. Always deterministic and lowercase.
|
||||
- `code_id`: `{format, build_id?, file_hash?, start, length, code_block_hash?}`; used when symbol names are absent.
|
||||
- `symbol_digest`: sha256 of normalized signature (demangled name + params + return type; strip addresses). For stripped code, combine synthetic name + block hash.
|
||||
- `purl`: package URL of the owning component (from SBOM resolver); `pkg:unknown` when unresolved.
|
||||
|
||||
## 2. Graph payload (`richgraph-v1` additions)
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"id": "sym:sha256:...",
|
||||
"symbol_id": "func:ELF:sha256:...",
|
||||
"code_id": "code:ELF:sha256:...",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
|
||||
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF", "confidence": 0.98 },
|
||||
"build_id": "a1b2c3...",
|
||||
"lang": "c",
|
||||
"evidence": ["dwarf", "dynsym"],
|
||||
"analyzer": { "name": "scanner.native", "version": "1.2.0", "toolchain": "ghidra-11" }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:sha256:caller",
|
||||
"to": "sym:sha256:callee",
|
||||
"kind": "direct|plt|indirect|runtime",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64", // callee owner
|
||||
"symbol_digest": "sha256:...", // callee digest
|
||||
"candidates": ["pkg:deb/openssl@3.0.2", "pkg:deb/openssl@3.0.1"],
|
||||
"confidence": 0.92,
|
||||
"evidence": ["import", "reloc@GOT"]
|
||||
}
|
||||
],
|
||||
"roots": [
|
||||
{ "id": "init_array@0x401000", "phase": "load", "source": "DT_INIT_ARRAY" },
|
||||
{ "id": "main", "phase": "runtime" }
|
||||
],
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
## 2.5 Attestation levels (hybrid default)
|
||||
|
||||
- **Graph DSSE (required):** one DSSE envelope over the canonical graph JSON (sorted arrays/keys) with `graph_hash` = BLAKE3 of body; Rekor publish always (or mirror when offline).
|
||||
- **Edge-bundle DSSE (optional):** batches of ≤512 edges, emitted only for high-signal cases (`runtime`, `init_array`/TLS roots, contested/third-party edges). Each bundle carries `graph_hash`, `bundle_reason`, per-edge `reason`, `symbol_digest`, `purl`, `confidence`, and optional `revoked=true` for quarantine. Rekor publish is configurable; CAS storage is mandatory.
|
||||
- CAS layout additions:
|
||||
- Graph body: `cas://reachability/graphs/{blake3}`
|
||||
- Graph DSSE: `cas://reachability/graphs/{blake3}.dsse`
|
||||
- Edge bundle: `cas://reachability/edges/{graph_hash}/{bundle_id}` + `.dsse`
|
||||
- Determinism: bundle ordering by `(bundle_reason, edge_id)`; arrays sorted before hashing.
|
||||
|
||||
## 3. Runtime facts (Signals ingestion)
|
||||
|
||||
Fields per NDJSON event:
|
||||
|
||||
- `symbolId` (required), `codeId`, `symbolDigest?`, `purl?`
|
||||
- `hitCount`, `observedAt`, `loaderBase`, `processId`, `processName`, `containerId`, `socketAddress?`
|
||||
- `callgraphId` or `scanId`, plus `evidenceUri` (CAS) if trace stored externally
|
||||
- Determinism: sort keys when persisting; timestamps UTC ISO-8601.
|
||||
|
||||
## 4. Unknowns registry payload
|
||||
|
||||
See `docs/modules/signals/guides/unknowns-registry.md`; reachability producers emit Unknowns when:
|
||||
- symbol→purl unresolved,
|
||||
- call edge target unresolved,
|
||||
- build-id missing for ELF and file hash used instead.
|
||||
|
||||
Unknowns must include `unknown_type`, `scope`, `provenance`, `confidence.p`, and `labels`.
|
||||
|
||||
## 5. CAS layout
|
||||
|
||||
- Graphs: `cas://reachability/graphs/{blake3}` (canonical JSON, sorted keys/arrays)
|
||||
- Runtime traces: `cas://reachability/runtime/{sha256}`
|
||||
- Unknowns evidence (optional large blobs): `cas://unknowns/{sha256}`
|
||||
- Edge bundles: `cas://reachability/edges/{graph_hash}/{bundle_id}` (JSON + `.dsse`)
|
||||
|
||||
Metadata for each CAS object: `{ schema: "richgraph-v1", analyzer: {name,version}, createdAtUtc, toolchain_digest }`. When analyzer metadata is supplied at ingest (Signals OpenAPI), persist it alongside parsed analyzer fields from the artifact.
|
||||
|
||||
## 6. Validation rules
|
||||
|
||||
- All edges must carry either `purl` or `candidates[]`; never leave both empty.
|
||||
- If `build_id` present, `symbol_id` and `code_id` must store it; if absent, record `build_id_source: "FileHash"`.
|
||||
- Evidence arrays sorted; confidence in [0,1].
|
||||
- `code_block_hash` (when present) must be lowercase hex with an algorithm prefix (e.g., `sha256:`) and only accompany stripped/heuristic nodes.
|
||||
- Roots must include load-time constructors when present.
|
||||
- When `edge_bundles` are present, each edge in a bundle must also exist in the graph edge set; `revoked=true` bundles override graph edges for policy/scoring.
|
||||
- Graph DSSE is mandatory per scan; edge-bundle DSSEs are optional but must reference `graph_hash` and `bundle_id`.
|
||||
|
||||
## 7. Acceptance checklist
|
||||
|
||||
- Schema reflected in Scanner/Signals DTOs and OpenAPI responses.
|
||||
- CAS writers enforce canonicalization before hashing.
|
||||
- Fixtures include: build-id present/absent, init-array roots, purl-resolved imports-only edge, stripped binary with block-hash symbol digest, and an Unknowns case.
|
||||
454
docs/modules/reach-graph/schemas/explainability-schema.md
Normal file
454
docs/modules/reach-graph/schemas/explainability-schema.md
Normal file
@@ -0,0 +1,454 @@
|
||||
# Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._
|
||||
|
||||
This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:
|
||||
|
||||
- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
|
||||
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
|
||||
- **Confidence scores:** Per-rule and aggregate confidence values
|
||||
- **Redaction metadata:** PII handling and data classification
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EX1: Schema/Canonicalization + Hashes
|
||||
|
||||
**Explanation schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation@v1",
|
||||
"explanation_id": "explain:sha256:{hex}",
|
||||
"finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"verdict": {
|
||||
"status": "affected",
|
||||
"severity": {"normalized": "Critical", "score": 10.0},
|
||||
"confidence": 0.92
|
||||
},
|
||||
"decision_chain": [
|
||||
{
|
||||
"rule_id": "rule:reachability_gate",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"reachability.state": "CR",
|
||||
"reachability.confidence": 0.92
|
||||
},
|
||||
"output": {"allowed": true, "contribution": 0.4},
|
||||
"evidence_refs": ["cas://reachability/graphs/blake3:..."]
|
||||
},
|
||||
{
|
||||
"rule_id": "rule:severity_baseline",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"cvss_base": 10.0,
|
||||
"epss_percentile": 0.95
|
||||
},
|
||||
"output": {"severity": "Critical", "contribution": 0.6},
|
||||
"evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
|
||||
}
|
||||
],
|
||||
"aggregate_confidence": 0.88,
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"policy_version": "sha256:...",
|
||||
"graph_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
**Canonicalization rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all levels
|
||||
2. Arrays in `decision_chain` ordered by rule execution sequence
|
||||
3. `evidence_refs` arrays sorted alphabetically
|
||||
4. No whitespace, UTF-8 encoding
|
||||
5. Hash computed over canonical JSON: `sha256(canonical_json)`
|
||||
|
||||
### EX2: DSSE Predicate/Signing Policy
|
||||
|
||||
**DSSE predicate type:**
|
||||
|
||||
```
|
||||
stella.ops/explanation@v1
|
||||
```
|
||||
|
||||
**Signing policy:**
|
||||
|
||||
| Element | Required | Signer |
|
||||
|---------|----------|--------|
|
||||
| Explanation body | Yes | Policy Engine key |
|
||||
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
|
||||
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |
|
||||
|
||||
**DSSE envelope structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.explanation+json",
|
||||
"payload": "<base64(canonical_explanation_json)>",
|
||||
"signatures": [
|
||||
{
|
||||
"keyid": "policy-engine-signing-2025",
|
||||
"sig": "base64:..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Signing requirements:**
|
||||
|
||||
- All explanations must be signed before CAS storage
|
||||
- Signing key must be registered in Authority key store
|
||||
- Key rotation triggers re-signing of active explanations (configurable)
|
||||
|
||||
### EX3: CAS Storage Rules for Evidence
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://explanations/
|
||||
{sha256}/ # Explanation body
|
||||
{sha256}.dsse # DSSE envelope
|
||||
by-finding/{finding_id}/ # Index by finding
|
||||
by-policy/{policy_digest}/ # Index by policy version
|
||||
by-graph/{graph_revision_id}/ # Index by graph revision
|
||||
```
|
||||
|
||||
**Storage rules:**
|
||||
|
||||
1. Explanations are immutable after signing
|
||||
2. New verdicts create new explanation documents (no updates)
|
||||
3. Previous explanations are retained per retention policy
|
||||
4. Cross-references validated at write time (graphs, VEX must exist)
|
||||
|
||||
**Deduplication:**
|
||||
|
||||
- Identical canonical JSON produces identical hash
|
||||
- CAS returns existing reference if content matches
|
||||
|
||||
### EX4: Link to Decision/Policy and graph_revision_id
|
||||
|
||||
**Required links:**
|
||||
|
||||
```json
|
||||
{
|
||||
"links": {
|
||||
"policy_version": "sha256:7e1d...",
|
||||
"policy_uri": "cas://policy/versions/sha256:7e1d...",
|
||||
"graph_revision_id": "rev:blake3:a1b2...",
|
||||
"graph_uri": "cas://reachability/revisions/blake3:a1b2...",
|
||||
"sbom_digest": "sha256:def4...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"vex_digest": "sha256:e5f6...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Validation:**
|
||||
|
||||
- All linked artifacts must exist at explanation creation time
|
||||
- Links are verified during replay/audit
|
||||
- Broken links cause replay verification failure
|
||||
|
||||
### EX5: Export/Replay Bundle Format
|
||||
|
||||
**Export bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation.bundle@v1",
|
||||
"bundle_id": "bundle:explain:2025-12-13",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"explanations": [
|
||||
{
|
||||
"explanation_id": "explain:sha256:...",
|
||||
"finding_id": "...",
|
||||
"explanation_uri": "explanations/sha256:....json",
|
||||
"dsse_uri": "explanations/sha256:....dsse"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
"graphs": [
|
||||
{"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
|
||||
],
|
||||
"policies": [
|
||||
{"digest": "sha256:...", "uri": "policies/sha256:....json"}
|
||||
],
|
||||
"vex_statements": [
|
||||
{"digest": "sha256:...", "uri": "vex/sha256:....json"}
|
||||
]
|
||||
},
|
||||
"verification": {
|
||||
"bundle_hash": "sha256:...",
|
||||
"signature": "base64:...",
|
||||
"signed_by": "policy-engine-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Replay verification:**
|
||||
|
||||
```bash
|
||||
stella explain verify --bundle ./explanation-bundle.tgz
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:explain:2025-12-13
|
||||
Explanations: 42
|
||||
Dependencies: 5 graphs, 2 policies, 12 VEX
|
||||
|
||||
Verifying explanations...
|
||||
Canonical hashes: 42/42 MATCH
|
||||
DSSE signatures: 42/42 VALID
|
||||
Dependency links: 42/42 RESOLVED
|
||||
|
||||
Replay verification PASSED.
|
||||
```
|
||||
|
||||
### EX6: PII/Redaction Rules
|
||||
|
||||
**Redaction categories:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
|
||||
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
|
||||
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
|
||||
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
|
||||
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |
|
||||
|
||||
**Redaction metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"redaction": {
|
||||
"applied": true,
|
||||
"level": "standard",
|
||||
"fields_redacted": ["actor.email", "evidence.file_path"],
|
||||
"redaction_policy": "stellaops.redaction.standard@v1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Export modes:**
|
||||
|
||||
- `--redacted` (default): Apply standard redaction
|
||||
- `--full`: Include all data (requires `explain:export:full` scope)
|
||||
- `--audit`: Include redaction audit trail
|
||||
|
||||
### EX7: Size Budgets
|
||||
|
||||
**Limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Explanation body | 256 KB | Yes |
|
||||
| Decision chain entries | 100 | Yes |
|
||||
| Evidence refs per rule | 20 | Yes |
|
||||
| Total evidence refs | 200 | Yes |
|
||||
| Path entries | 50 | No |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
When limits are exceeded:
|
||||
1. Log warning with truncation details
|
||||
2. Add `truncation` metadata to explanation
|
||||
3. Store full evidence in separate CAS object
|
||||
4. Include `full_evidence_uri` reference
|
||||
|
||||
```json
|
||||
{
|
||||
"truncation": {
|
||||
"applied": true,
|
||||
"elements_truncated": ["decision_chain", "evidence_refs"],
|
||||
"full_evidence_uri": "cas://explanations/full/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EX8: Versioning
|
||||
|
||||
**Schema versioning:**
|
||||
|
||||
- Schema version in `schema` field: `stellaops.explanation@v1`
|
||||
- Breaking changes increment major version
|
||||
- Minor changes (additive fields) use v1.x
|
||||
- Backward compatibility maintained for 2 major versions
|
||||
|
||||
**Migration support:**
|
||||
|
||||
```bash
|
||||
stella explain migrate --from v1 --to v2 --input ./explanations/
|
||||
|
||||
# Output:
|
||||
Migrating 1000 explanations from v1 to v2...
|
||||
Migrated: 998
|
||||
Skipped (already v2): 2
|
||||
|
||||
Migration complete.
|
||||
```
|
||||
|
||||
**Version compatibility matrix:**
|
||||
|
||||
| API Version | Schema v1 | Schema v2 |
|
||||
|-------------|-----------|-----------|
|
||||
| 1.0.x | Full | N/A |
|
||||
| 1.1.x | Full | Full |
|
||||
| 2.0.x | Read-only | Full |
|
||||
|
||||
### EX9: Golden Fixtures/Tests
|
||||
|
||||
**Test fixture location:**
|
||||
|
||||
```
|
||||
tests/Explanation/
|
||||
fixtures/
|
||||
simple-affected.json
|
||||
simple-not-affected.json
|
||||
with-reachability-evidence.json
|
||||
multi-rule-chain.json
|
||||
truncated-evidence.json
|
||||
redacted-pii.json
|
||||
golden/
|
||||
simple-affected.golden.json
|
||||
simple-affected.golden.dsse
|
||||
|
||||
datasets/explanations/
|
||||
schema/
|
||||
explanation.schema.json
|
||||
samples/
|
||||
log4j-affected/
|
||||
explanation.json
|
||||
expected-hash.txt
|
||||
```
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Canonicalization tests:** Verify hash stability across JSON reordering
|
||||
2. **DSSE signing tests:** Verify signature creation and verification
|
||||
3. **Redaction tests:** Verify PII handling
|
||||
4. **Truncation tests:** Verify size budget enforcement
|
||||
5. **Replay tests:** Verify bundle export/import cycle
|
||||
6. **Migration tests:** Verify version upgrade paths
|
||||
|
||||
**CI integration:**
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/explanation-tests.yml
|
||||
explanation-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Run explanation tests
|
||||
run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
|
||||
- name: Verify golden fixtures
|
||||
run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
|
||||
```
|
||||
|
||||
### EX10: Determinism Guarantees
|
||||
|
||||
**Determinism requirements:**
|
||||
|
||||
1. Same inputs produce identical `explanation_id` hash
|
||||
2. Decision chain ordering is stable (execution order)
|
||||
3. Evidence refs sorted alphabetically
|
||||
4. Timestamps use UTC ISO-8601 with millisecond precision
|
||||
5. Floating-point values rounded to 6 decimal places
|
||||
|
||||
**Verification:**
|
||||
|
||||
```bash
|
||||
# Run twice with same inputs, verify identical hashes
|
||||
stella explain generate --finding "..." --output a.json
|
||||
stella explain generate --finding "..." --output b.json
|
||||
diff a.json b.json # Should be empty
|
||||
|
||||
# Or use built-in verify
|
||||
stella explain verify-determinism --finding "..." --iterations 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Generate Explanation
|
||||
|
||||
```http
|
||||
POST /api/policy/findings/{findingId}/explain
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"mode": "full",
|
||||
"include_evidence": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Explanation
|
||||
|
||||
```http
|
||||
GET /api/explanations/{explanationId}
|
||||
Authorization: Bearer <token>
|
||||
Accept: application/json
|
||||
```
|
||||
|
||||
### 3.3 Export Explanation Bundle
|
||||
|
||||
```http
|
||||
POST /api/explanations/export
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"finding_ids": ["...", "..."],
|
||||
"include_dependencies": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Verify Explanation
|
||||
|
||||
```http
|
||||
POST /api/explanations/{explanationId}/verify
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. CLI Reference
|
||||
|
||||
```bash
|
||||
# Generate explanation for a finding
|
||||
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"
|
||||
|
||||
# Export explanation bundle
|
||||
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz
|
||||
|
||||
# Verify explanation
|
||||
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse
|
||||
|
||||
# Verify bundle
|
||||
stella explain verify --bundle ./bundle.tgz
|
||||
|
||||
# Check determinism
|
||||
stella explain verify-determinism --finding "..." --iterations 5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Related Documentation
|
||||
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
|
||||
- [Policy API](../api/policy.md) - Policy Engine REST API
|
||||
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._
|
||||
377
docs/modules/reach-graph/schemas/graph-revision-schema.md
Normal file
377
docs/modules/reach-graph/schemas/graph-revision-schema.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Graph Revision Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Platform Guild._
|
||||
|
||||
This document defines the graph revision schema addressing gaps GR1-GR10 from the November 2025 product findings. It specifies manifest structure, hash algorithms, storage layout, lineage tracking, and governance rules for deterministic, auditable reachability graphs.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Graph revisions provide content-addressable, append-only versioning for `richgraph-v1` documents. Every graph mutation produces a new immutable revision with:
|
||||
|
||||
- **Deterministic hash:** BLAKE3-256 of canonical JSON
|
||||
- **Lineage metadata:** Parent revision + diff summary
|
||||
- **Cross-artifact digests:** Links to SBOM, VEX, policy, and tool versions
|
||||
- **Audit trail:** Timestamp, actor, tenant, and operation type
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### GR1: Manifest Schema + Canonical Hash Rules
|
||||
|
||||
**Manifest schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:a1b2c3d4e5f6...",
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"parent_revision_id": "rev:blake3:9f8e7d6c5b4a...",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"created_by": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"operation": "create",
|
||||
"lineage": {
|
||||
"depth": 3,
|
||||
"root_revision_id": "rev:blake3:1a2b3c4d5e6f..."
|
||||
},
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"vex_digest": "sha256:...",
|
||||
"policy_digest": "sha256:...",
|
||||
"analyzer_digest": "sha256:..."
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"roots_changed": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Canonical hash rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all nesting levels
|
||||
2. No whitespace/indentation (compact JSON)
|
||||
3. UTF-8 encoding, no BOM
|
||||
4. Arrays sorted by deterministic key (nodes by `id`, edges by `from,to,kind`)
|
||||
5. Null/empty values omitted
|
||||
6. Numeric values without trailing zeros
|
||||
|
||||
### GR2: Mandated BLAKE3-256 Encoding
|
||||
|
||||
All graph-level hashes use BLAKE3-256 with the following format:
|
||||
|
||||
```
|
||||
blake3:{64_hex_chars}
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- BLAKE3 is 3x+ faster than SHA-256 on modern CPUs
|
||||
- Parallelizable for large graphs (>100K nodes)
|
||||
- Cryptographically secure (256-bit security)
|
||||
- Algorithm prefix enables future migration
|
||||
|
||||
### GR3: Append-Only Storage
|
||||
|
||||
Graph revisions are immutable. Operations:
|
||||
|
||||
| Operation | Creates New Revision | Modifies Existing |
|
||||
|-----------|---------------------|-------------------|
|
||||
| `create` | Yes | No |
|
||||
| `update` | Yes | No |
|
||||
| `merge` | Yes | No |
|
||||
| `tombstone` | Yes | No |
|
||||
| `read` | No | No |
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
revisions/
|
||||
{blake3}/ # Revision manifest
|
||||
{blake3}.graph # Graph body
|
||||
{blake3}.dsse # DSSE envelope
|
||||
indices/
|
||||
by-tenant/{tenant_id}/ # Tenant index
|
||||
by-sbom/{sbom_digest}/ # SBOM correlation
|
||||
by-root/{root_revision_id}/ # Lineage tree
|
||||
```
|
||||
|
||||
### GR4: Lineage/Diff Metadata
|
||||
|
||||
Every revision tracks its lineage:
|
||||
|
||||
```json
|
||||
{
|
||||
"lineage": {
|
||||
"depth": 5,
|
||||
"root_revision_id": "rev:blake3:...",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"merge_parents": []
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"nodes_modified": 0,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"edges_modified": 0,
|
||||
"roots_added": 0,
|
||||
"roots_removed": 0
|
||||
},
|
||||
"diff_detail_uri": "cas://reachability/diffs/{parent_hash}_{child_hash}.ndjson"
|
||||
}
|
||||
```
|
||||
|
||||
**Diff detail format (NDJSON):**
|
||||
|
||||
```ndjson
|
||||
{"op":"add","path":"nodes","value":{"id":"sym:java:...","display":"..."}}
|
||||
{"op":"remove","path":"edges","from":"sym:java:a","to":"sym:java:b"}
|
||||
```
|
||||
|
||||
### GR5: Cross-Artifact Digests (SBOM/VEX/Policy/Tool)
|
||||
|
||||
Every revision links to related artifacts:
|
||||
|
||||
```json
|
||||
{
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"sbom_format": "cyclonedx-1.6",
|
||||
"vex_digest": "sha256:...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json",
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"analyzer_digest": "sha256:...",
|
||||
"analyzer_name": "scanner.java",
|
||||
"analyzer_version": "1.2.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR6: UI/CLI Surfacing of Full/Short IDs
|
||||
|
||||
**Full ID format:**
|
||||
```
|
||||
rev:blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Short ID format (for display):**
|
||||
```
|
||||
rev:a1b2c3d4
|
||||
```
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List revisions
|
||||
stella graph revisions --scan-id scan-123
|
||||
|
||||
# Show full ID
|
||||
stella graph revisions --scan-id scan-123 --full
|
||||
|
||||
# Output:
|
||||
REVISION CREATED NODES EDGES PARENT
|
||||
rev:a1b2c3d4 2025-12-13T10:00:00 1247 3891 rev:9f8e7d6c
|
||||
rev:9f8e7d6c 2025-12-12T15:30:00 1235 3867 rev:1a2b3c4d
|
||||
```
|
||||
|
||||
**UI display:**
|
||||
|
||||
- Revision chips show short ID with copy-to-clipboard for full ID
|
||||
- Hover tooltip shows full ID and creation timestamp
|
||||
- Lineage tree visualization available in "Revision History" drawer
|
||||
|
||||
### GR7: Shard/Tenant Context
|
||||
|
||||
Every revision includes partition context:
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"namespace": "prod",
|
||||
"workspace_id": "ws:default"
|
||||
}
|
||||
```
|
||||
|
||||
**Tenant isolation:**
|
||||
|
||||
- Revisions are tenant-scoped; cross-tenant access requires explicit grants
|
||||
- Shard ID enables horizontal scaling and data locality
|
||||
- Namespace supports multi-environment deployments
|
||||
|
||||
### GR8: Pin/Audit Governance
|
||||
|
||||
**Pinned revisions:**
|
||||
|
||||
Revisions can be pinned to prevent automatic retention cleanup:
|
||||
|
||||
```json
|
||||
{
|
||||
"pinned": true,
|
||||
"pinned_at": "2025-12-13T10:00:00Z",
|
||||
"pinned_by": "user:alice",
|
||||
"pin_reason": "Audit retention for CVE-2021-44228 investigation",
|
||||
"pin_expires_at": "2026-12-13T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Audit events:**
|
||||
|
||||
All revision operations emit audit events:
|
||||
|
||||
```json
|
||||
{
|
||||
"event_type": "graph.revision.created",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"actor": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"metadata": {
|
||||
"operation": "create",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR9: Retention/Tombstones
|
||||
|
||||
**Retention policy:**
|
||||
|
||||
| Category | Default Retention | Configurable |
|
||||
|----------|-------------------|--------------|
|
||||
| Latest revision | Forever | No |
|
||||
| Intermediate revisions | 90 days | Yes |
|
||||
| Tombstoned revisions | 30 days | Yes |
|
||||
| Pinned revisions | Until unpin + 7 days | No |
|
||||
|
||||
**Tombstone format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"tombstone": true,
|
||||
"tombstoned_at": "2025-12-13T10:00:00Z",
|
||||
"tombstoned_by": "service:retention-worker",
|
||||
"tombstone_reason": "retention_policy",
|
||||
"successor_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
### GR10: Inclusion in Offline Kits
|
||||
|
||||
Offline kits include graph revisions for air-gapped deployments:
|
||||
|
||||
**Offline bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.offline.bundle@v1",
|
||||
"bundle_id": "bundle:2025-12-13",
|
||||
"graph_revisions": [
|
||||
{
|
||||
"revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:...",
|
||||
"included_artifacts": ["graph", "dsse", "diff"]
|
||||
}
|
||||
],
|
||||
"rekor_checkpoints": [
|
||||
{
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"checkpoint": "...",
|
||||
"verified_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"signature": {
|
||||
"algorithm": "ecdsa-p256",
|
||||
"value": "base64:...",
|
||||
"public_key_id": "key:offline-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Import verification:**
|
||||
|
||||
```bash
|
||||
stella offline import --bundle ./offline-bundle.tgz --verify
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:2025-12-13
|
||||
Graph Revisions: 5
|
||||
Rekor Checkpoints: 2
|
||||
|
||||
Verifying signatures...
|
||||
Bundle signature: VALID
|
||||
DSSE envelopes: 5/5 VALID
|
||||
Rekor checkpoints: 2/2 VERIFIED
|
||||
|
||||
Import complete.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Create Revision
|
||||
|
||||
```http
|
||||
POST /api/graph/revisions
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"graph": { ... richgraph-v1 ... },
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"cross_artifacts": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Revision
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/{revision_id}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.3 List Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions?tenant_id=acme&sbom_digest=sha256:...&limit=20
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.4 Diff Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/diff?from={rev_a}&to={rev_b}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [CAS Infrastructure](../contracts/cas-infrastructure.md) - Content-addressable storage
|
||||
- [Offline Kit](../OFFLINE_KIT.md) - Air-gap deployment
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GRAPHREV-GAPS-401-063 for change history._
|
||||
337
docs/modules/reach-graph/schemas/ground-truth-schema.md
Normal file
337
docs/modules/reach-graph/schemas/ground-truth-schema.md
Normal file
@@ -0,0 +1,337 @@
|
||||
# Ground Truth Schema for Reachability Datasets
|
||||
|
||||
> **Status:** Design v1 (Sprint 0401)
|
||||
> **Owners:** Scanner Guild, Signals Guild, Quality Guild
|
||||
|
||||
This document defines the ground truth schema for test datasets used to validate reachability analysis. Ground truth samples provide known-correct answers for benchmarking lattice state calculations, path discovery, and policy gate decisions.
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Ground truth datasets enable:
|
||||
|
||||
1. **Regression testing:** Detect regressions in reachability analysis accuracy
|
||||
2. **Benchmark scoring:** Measure precision, recall, F1 for path discovery
|
||||
3. **Lattice validation:** Verify join/meet operations produce expected states
|
||||
4. **Policy gate testing:** Ensure gates block/allow correct VEX transitions
|
||||
|
||||
---
|
||||
|
||||
## 2. Dataset Structure
|
||||
|
||||
### 2.1 Directory Layout
|
||||
|
||||
```
|
||||
datasets/reachability/
|
||||
├── samples/
|
||||
│ ├── java/
|
||||
│ │ ├── vulnerable-log4j/
|
||||
│ │ │ ├── manifest.json # Sample metadata
|
||||
│ │ │ ├── richgraph-v1.json # Input callgraph
|
||||
│ │ │ ├── ground-truth.json # Expected outcomes
|
||||
│ │ │ └── artifacts/ # Source binaries/SBOMs
|
||||
│ │ └── safe-spring-boot/
|
||||
│ │ └── ...
|
||||
│ ├── native/
|
||||
│ │ ├── stripped-elf/
|
||||
│ │ └── openssl-vuln/
|
||||
│ └── polyglot/
|
||||
│ └── node-native-addon/
|
||||
├── corpus/
|
||||
│ ├── positive/ # Known reachable samples
|
||||
│ ├── negative/ # Known unreachable samples
|
||||
│ └── contested/ # Known conflict samples
|
||||
└── schema/
|
||||
├── manifest.schema.json
|
||||
└── ground-truth.schema.json
|
||||
```
|
||||
|
||||
### 2.2 Sample Manifest (`manifest.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"sampleId": "sample:java:vulnerable-log4j:001",
|
||||
"version": "1.0.0",
|
||||
"createdAt": "2025-12-13T10:00:00Z",
|
||||
"language": "java",
|
||||
"category": "positive",
|
||||
"description": "Log4Shell CVE-2021-44228 reachable via JNDI lookup in logging path",
|
||||
"source": {
|
||||
"repository": "https://github.com/example/vuln-app",
|
||||
"commit": "abc123...",
|
||||
"buildToolchain": "maven:3.9.0,jdk:17"
|
||||
},
|
||||
"vulnerabilities": [
|
||||
{
|
||||
"vulnId": "CVE-2021-44228",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"affectedSymbol": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup"
|
||||
}
|
||||
],
|
||||
"artifacts": [
|
||||
{
|
||||
"path": "artifacts/app.jar",
|
||||
"hash": "sha256:...",
|
||||
"type": "application/java-archive"
|
||||
},
|
||||
{
|
||||
"path": "artifacts/sbom.cdx.json",
|
||||
"hash": "sha256:...",
|
||||
"type": "application/vnd.cyclonedx+json"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Ground Truth Document (`ground-truth.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "ground-truth-v1",
|
||||
"sampleId": "sample:java:vulnerable-log4j:001",
|
||||
"generatedAt": "2025-12-13T10:00:00Z",
|
||||
"generator": {
|
||||
"name": "manual-annotation",
|
||||
"version": "1.0.0",
|
||||
"annotator": "security-team"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"symbolId": "sym:java:...",
|
||||
"display": "org.apache.logging.log4j.core.lookup.JndiLookup.lookup",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"expected": {
|
||||
"latticeState": "CR",
|
||||
"bucket": "direct",
|
||||
"reachable": true,
|
||||
"confidence": 0.95,
|
||||
"pathLength": 3,
|
||||
"path": [
|
||||
"sym:java:...main",
|
||||
"sym:java:...logInfo",
|
||||
"sym:java:...JndiLookup.lookup"
|
||||
]
|
||||
},
|
||||
"reasoning": "Direct call path from main() through logging framework to vulnerable lookup method"
|
||||
},
|
||||
{
|
||||
"symbolId": "sym:java:...",
|
||||
"display": "org.apache.logging.log4j.core.net.JndiManager.lookup",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"expected": {
|
||||
"latticeState": "CU",
|
||||
"bucket": "unreachable",
|
||||
"reachable": false,
|
||||
"confidence": 0.90,
|
||||
"pathLength": null,
|
||||
"path": null
|
||||
},
|
||||
"reasoning": "JndiManager.lookup is present but not called from any reachable entry point"
|
||||
}
|
||||
],
|
||||
"entryPoints": [
|
||||
{
|
||||
"symbolId": "sym:java:...",
|
||||
"display": "com.example.app.Main.main",
|
||||
"phase": "runtime",
|
||||
"source": "manifest"
|
||||
}
|
||||
],
|
||||
"expectedUncertainty": {
|
||||
"states": [],
|
||||
"aggregateTier": "T4",
|
||||
"riskScore": 0.0
|
||||
},
|
||||
"expectedGateDecisions": [
|
||||
{
|
||||
"vulnId": "CVE-2021-44228",
|
||||
"targetSymbol": "sym:java:...JndiLookup.lookup",
|
||||
"requestedStatus": "not_affected",
|
||||
"expectedDecision": "block",
|
||||
"expectedBlockedBy": "LatticeState",
|
||||
"expectedReason": "CR state incompatible with not_affected"
|
||||
},
|
||||
{
|
||||
"vulnId": "CVE-2021-44228",
|
||||
"targetSymbol": "sym:java:...JndiLookup.lookup",
|
||||
"requestedStatus": "affected",
|
||||
"expectedDecision": "allow"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Schema Definitions
|
||||
|
||||
### 3.1 Ground Truth Target
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `symbolId` | string | Yes | Canonical SymbolID (`sym:{lang}:{hash}`) |
|
||||
| `display` | string | No | Human-readable symbol name |
|
||||
| `purl` | string | No | Package URL of containing package |
|
||||
| `expected.latticeState` | enum | Yes | Expected v1 lattice state: `U`, `SR`, `SU`, `RO`, `RU`, `CR`, `CU`, `X` |
|
||||
| `expected.bucket` | enum | Yes | Expected v0 bucket (backward compat) |
|
||||
| `expected.reachable` | boolean | Yes | True if symbol is reachable from any entry point |
|
||||
| `expected.confidence` | number | Yes | Expected confidence score [0.0-1.0] |
|
||||
| `expected.pathLength` | number | No | Expected path length (null if unreachable) |
|
||||
| `expected.path` | string[] | No | Expected path (sorted, deterministic) |
|
||||
| `reasoning` | string | Yes | Human explanation of expected outcome |
|
||||
|
||||
### 3.2 Expected Gate Decision
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `vulnId` | string | Yes | Vulnerability identifier |
|
||||
| `targetSymbol` | string | Yes | Target SymbolID |
|
||||
| `requestedStatus` | enum | Yes | VEX status: `affected`, `not_affected`, `under_investigation`, `fixed` |
|
||||
| `expectedDecision` | enum | Yes | Gate outcome: `allow`, `block`, `warn` |
|
||||
| `expectedBlockedBy` | string | No | Gate name if blocked |
|
||||
| `expectedReason` | string | No | Expected reason message |
|
||||
|
||||
---
|
||||
|
||||
## 4. Sample Categories
|
||||
|
||||
### 4.1 Positive Samples (Reachable)
|
||||
|
||||
Known-reachable cases where vulnerable code is called:
|
||||
|
||||
- **direct-call:** Vulnerable function called directly from entry point
|
||||
- **transitive:** Multi-hop path from entry point to vulnerable function
|
||||
- **runtime-observed:** Confirmed reachable via runtime probe
|
||||
- **init-array:** Reachable via load-time constructor
|
||||
|
||||
### 4.2 Negative Samples (Unreachable)
|
||||
|
||||
Known-unreachable cases where vulnerable code exists but isn't called:
|
||||
|
||||
- **dead-code:** Function present but never invoked
|
||||
- **conditional-unreachable:** Function behind impossible condition
|
||||
- **test-only:** Function only reachable from test entry points
|
||||
- **deprecated-api:** Old API present but replaced by new implementation
|
||||
|
||||
### 4.3 Contested Samples
|
||||
|
||||
Cases where static and runtime evidence conflict:
|
||||
|
||||
- **static-reach-runtime-miss:** Static analysis finds path, runtime never observes
|
||||
- **static-miss-runtime-hit:** Static analysis misses path, runtime observes execution
|
||||
- **version-mismatch:** Analysis version differs from runtime version
|
||||
|
||||
---
|
||||
|
||||
## 5. Benchmark Metrics
|
||||
|
||||
### 5.1 Path Discovery Metrics
|
||||
|
||||
```
|
||||
Precision = TruePositive / (TruePositive + FalsePositive)
|
||||
Recall = TruePositive / (TruePositive + FalseNegative)
|
||||
F1 = 2 * (Precision * Recall) / (Precision + Recall)
|
||||
```
|
||||
|
||||
### 5.2 Lattice State Accuracy
|
||||
|
||||
```
|
||||
StateAccuracy = CorrectStates / TotalTargets
|
||||
BucketAccuracy = CorrectBuckets / TotalTargets (v0 compatibility)
|
||||
```
|
||||
|
||||
### 5.3 Gate Decision Accuracy
|
||||
|
||||
```
|
||||
GateAccuracy = CorrectDecisions / TotalGateTests
|
||||
FalseAllow = AllowedWhenShouldBlock / TotalBlocks (critical metric)
|
||||
FalseBlock = BlockedWhenShouldAllow / TotalAllows
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Test Harness Integration
|
||||
|
||||
### 6.1 xUnit Test Pattern
|
||||
|
||||
```csharp
|
||||
[Theory]
|
||||
[MemberData(nameof(GetGroundTruthSamples))]
|
||||
public async Task ReachabilityAnalysis_MatchesGroundTruth(GroundTruthSample sample)
|
||||
{
|
||||
// Arrange
|
||||
var graph = await LoadRichGraphAsync(sample.GraphPath);
|
||||
var scorer = _serviceProvider.GetRequiredService<ReachabilityScoringService>();
|
||||
|
||||
// Act
|
||||
var result = await scorer.ComputeAsync(graph, sample.EntryPoints);
|
||||
|
||||
// Assert
|
||||
foreach (var target in sample.Targets)
|
||||
{
|
||||
var actual = result.States.First(s => s.SymbolId == target.SymbolId);
|
||||
Assert.Equal(target.Expected.LatticeState, actual.LatticeState);
|
||||
Assert.Equal(target.Expected.Reachable, actual.Reachable);
|
||||
Assert.InRange(actual.Confidence,
|
||||
target.Expected.Confidence - 0.05,
|
||||
target.Expected.Confidence + 0.05);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Benchmark Runner
|
||||
|
||||
```bash
|
||||
# Run reachability benchmarks
|
||||
dotnet run --project src/Scanner/__Tests/StellaOps.Scanner.Reachability.Benchmarks \
|
||||
--dataset datasets/reachability/samples \
|
||||
--output benchmark-results.json \
|
||||
--threshold-f1 0.95 \
|
||||
--threshold-gate-accuracy 0.99
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Sample Contribution Guidelines
|
||||
|
||||
### 7.1 Adding New Samples
|
||||
|
||||
1. Create directory under `datasets/reachability/samples/{language}/{sample-name}/`
|
||||
2. Add `manifest.json` with sample metadata
|
||||
3. Add `richgraph-v1.json` (run scanner on artifacts)
|
||||
4. Create `ground-truth.json` with manual annotations
|
||||
5. Include reasoning for each expected outcome
|
||||
6. Run validation: `dotnet test --filter "GroundTruth"`
|
||||
|
||||
### 7.2 Ground Truth Validation
|
||||
|
||||
Ground truth files must pass schema validation:
|
||||
|
||||
```bash
|
||||
npx ajv validate -s docs/modules/reach-graph/schemas/ground-truth.schema.json \
|
||||
-d datasets/reachability/samples/**/ground-truth.json
|
||||
```
|
||||
|
||||
### 7.3 Review Requirements
|
||||
|
||||
- All samples require two independent annotators
|
||||
- Contested samples require security team review
|
||||
- Changes to existing samples require regression test pass
|
||||
|
||||
---
|
||||
|
||||
## 8. Related Documents
|
||||
|
||||
- [Lattice Model](./lattice.md) — v1 formal 7-state lattice
|
||||
- [Policy Gates](./policy-gate.md) — Gate rules for VEX decisions
|
||||
- [Evidence Schema](./evidence-schema.md) — richgraph-v1 schema
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) — Full schema specification
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial design from Sprint 0401 |
|
||||
129
docs/modules/reach-graph/schemas/runtime-static-union-schema.md
Normal file
129
docs/modules/reach-graph/schemas/runtime-static-union-schema.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Runtime + Static Reachability Union Schema (v0.1, 2025-11-23)
|
||||
|
||||
## Goals
|
||||
- Provide a single, deterministic graph shape that merges static lifter output and runtime traces across languages.
|
||||
- Keep SymbolID stable across hosts (path/location independent) so CAS lookups are reproducible and cacheable.
|
||||
- Make outputs offline-friendly: line-delimited JSON, UTF-8, sorted, with explicit content hashes.
|
||||
|
||||
## File layout (CAS)
|
||||
- Namespace root: `reachability_graphs/<analysis_id>/` (analysis_id is caller-supplied UUID or hash).
|
||||
- Files (all NDJSON, UTF-8, newline terminated, sorted as noted):
|
||||
- `nodes.ndjson` (sorted by `symbol_id`)
|
||||
- `edges.ndjson` (sorted by `from` then `to` then `edge_type`)
|
||||
- `facts_runtime.ndjson` (sorted by `symbol_id`, optional)
|
||||
- `meta.json` (single JSON object; schema version, produced_by, timestamps, tool versions, hashes)
|
||||
- Hashing: SHA-256 of each file recorded in `meta.json` under `files[]` with `path`, `sha256`, `records`.
|
||||
- Compression/packaging is left to the CAS store; files must be valid uncompressed NDJSON first.
|
||||
|
||||
## SymbolID (language-agnostic envelope)
|
||||
```
|
||||
symbol_id = "sym:" + <lang> + ":" + <stable-fragment>
|
||||
```
|
||||
- `lang`: `java|dotnet|go|node|deno|rust|swift|shell|binary`
|
||||
- `stable-fragment`: SHA-256(base64url-no-pad) of the canonical tuple per language:
|
||||
- **java**: (`package`, `class`, `method`, `descriptor`) lowercased, descriptor in JVM format.
|
||||
- **dotnet**: (`assembly_name`, `namespace`, `type`, `member_signature`) using ECMA-335 signature string.
|
||||
- **node/deno**: (`pkg_name_or_path`, `export_path`, `kind`) where `export_path` is slash-joined ESM/CJS path; `pkg_name_or_path` uses npm name or normalized absolute path with drive stripped.
|
||||
- **go**: (`module_path`, `package_path`, `receiver`, `func`), with receiver empty for functions.
|
||||
- **rust**: (`crate`, `module_path`, `item_name`, `mangled`)
|
||||
- **swift**: (`module`, `type`, `member`, `swift-mangled`)
|
||||
- **shell**: (`script_relpath`, `function_or_cmd`)
|
||||
- **binary**: (`binary_build_id`, `section`, `symbol_name`)
|
||||
|
||||
## nodes.ndjson
|
||||
Each line:
|
||||
```
|
||||
{
|
||||
"symbol_id": "sym:lang:...",
|
||||
"lang": "dotnet",
|
||||
"kind": "function|method|type|module|package|binary",
|
||||
"display": "Human readable name",
|
||||
"source": {
|
||||
"file": "relative/or/pkg/path",
|
||||
"line": 123,
|
||||
"col": 1,
|
||||
"digest": "sha256:<hex>"
|
||||
},
|
||||
"attributes": {
|
||||
"visibility": "public|internal|private",
|
||||
"async": true,
|
||||
"static": false,
|
||||
"generic_arity": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
Fields are optional when not applicable; omit rather than null. Additional language-specific fields allowed inside `attributes` (e.g., `jvm_descriptor`, `dotnet_signature`).
|
||||
|
||||
## edges.ndjson
|
||||
Each line (static or runtime-derived; see `source`):
|
||||
```
|
||||
{
|
||||
"from": "sym:...",
|
||||
"to": "sym:...",
|
||||
"edge_type": "call|import|inherits|loads|dynamic|reflects|dlopen|ffi|wasm|spawn",
|
||||
"confidence": "certain|high|medium|low",
|
||||
"source": {
|
||||
"origin": "static|runtime",
|
||||
"provenance": "jvm-bytecode|il|ts-ast|ssa|ebpf|etw|jfr|hook",
|
||||
"evidence": "file:path:line"
|
||||
}
|
||||
}
|
||||
```
|
||||
- Ordering: primary `from`, secondary `to`, tertiary `edge_type`.
|
||||
- Duplicate edges with different provenance are allowed; consumers deduplicate by (`from`,`to`,`edge_type`,`provenance`).
|
||||
|
||||
## facts_runtime.ndjson (optional)
|
||||
Runtime-only observations attached to symbols:
|
||||
```
|
||||
{
|
||||
"symbol_id": "sym:...",
|
||||
"samples": {
|
||||
"call_count": 14,
|
||||
"first_seen_utc": "2025-11-22T18:21:12Z",
|
||||
"last_seen_utc": "2025-11-22T18:23:01Z"
|
||||
},
|
||||
"env": {
|
||||
"pid": 1234,
|
||||
"image": "sha256:...",
|
||||
"entrypoint": "main",
|
||||
"tags": ["sealed","offline"]
|
||||
}
|
||||
}
|
||||
```
|
||||
Sorting by `symbol_id`. Time fields must be UTC ISO-8601 with `Z`.
|
||||
|
||||
## meta.json
|
||||
```
|
||||
{
|
||||
"schema": "reachability-union@0.1",
|
||||
"generated_at": "2025-11-23T00:00:00Z",
|
||||
"produced_by": {
|
||||
"tool": "StellaOps.Scanner.Worker",
|
||||
"version": "0.1.0",
|
||||
"analyzers": ["dotnet-11.1.0","jvm-8.0.0","node-6.2.0"]
|
||||
},
|
||||
"files": [
|
||||
{"path":"nodes.ndjson","sha256":"...","records":1234},
|
||||
{"path":"edges.ndjson","sha256":"...","records":4567},
|
||||
{"path":"facts_runtime.ndjson","sha256":"...","records":89}
|
||||
],
|
||||
"options": {
|
||||
"dedupe_edges": false,
|
||||
"include_runtime": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Determinism rules
|
||||
- Sort order as noted; no nulls; omit empty objects/arrays.
|
||||
- All strings UTF-8 NFC; booleans lower-case; edge_type enumerated list above.
|
||||
- Hash inputs use exact serialized bytes (no trailing spaces, newline `\n` only).
|
||||
|
||||
## Validation
|
||||
- JSON Schema draft 2020-12 available at `docs/modules/reach-graph/schemas/runtime-static-union-schema.json` (to be generated from this spec; allowable values match enumerations above).
|
||||
- Minimal required fields: `symbol_id`, `lang`, `kind` (nodes); `from`, `to`, `edge_type`, `source.origin` (edges).
|
||||
|
||||
## Integration guidance
|
||||
- Static lifters must emit SymbolIDs using the language rules; runtime probes must map call targets to the same SymbolID space (via demangled names + package/module resolution).
|
||||
- CAS writers store each file under the namespace path and return the root manifest path for downstream consumers (Signals, Replay, Policy).
|
||||
- Consumers should treat runtime edges as additive; when both origins exist, prefer `origin=runtime` for exploitability scoring but keep static edges for coverage.
|
||||
243
docs/modules/reach-graph/schemas/slice-schema.md
Normal file
243
docs/modules/reach-graph/schemas/slice-schema.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Reachability Slice Schema
|
||||
|
||||
_Last updated: 2025-12-22. Owner: Scanner Guild._
|
||||
|
||||
This document defines the **Reachability Slice** schema - a minimal, attestable proof unit that answers whether a vulnerable symbol is reachable from application entrypoints.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
A **slice** is a focused subgraph extracted from a full reachability graph, containing only the nodes and edges relevant to answering a specific reachability query (for example, "Is CVE-2024-1234's vulnerable function reachable?").
|
||||
|
||||
### Key Properties
|
||||
|
||||
| Property | Description |
|
||||
|----------|-------------|
|
||||
| **Minimal** | Contains only nodes/edges on paths between entrypoints and targets |
|
||||
| **Attestable** | DSSE-signed with a dedicated slice predicate |
|
||||
| **Reproducible** | Same inputs -> same bytes (deterministic) |
|
||||
| **Content-addressed** | Retrieved by BLAKE3 digest |
|
||||
|
||||
---
|
||||
|
||||
## 2. Predicate Type & Schema
|
||||
|
||||
- Predicate type: `stellaops.dev/predicates/reachability-slice@v1`
|
||||
- JSON schema: `https://stellaops.dev/schemas/stellaops-slice.v1.schema.json`
|
||||
- DSSE payload type: `application/vnd.stellaops.slice.v1+json`
|
||||
|
||||
---
|
||||
|
||||
## 3. Schema Structure
|
||||
|
||||
### 3.1 ReachabilitySlice
|
||||
|
||||
```csharp
|
||||
public sealed record ReachabilitySlice
|
||||
{
|
||||
[JsonPropertyName("_type")]
|
||||
public string Type { get; init; } = "stellaops.dev/predicates/reachability-slice@v1";
|
||||
|
||||
[JsonPropertyName("inputs")]
|
||||
public required SliceInputs Inputs { get; init; }
|
||||
|
||||
[JsonPropertyName("query")]
|
||||
public required SliceQuery Query { get; init; }
|
||||
|
||||
[JsonPropertyName("subgraph")]
|
||||
public required SliceSubgraph Subgraph { get; init; }
|
||||
|
||||
[JsonPropertyName("verdict")]
|
||||
public required SliceVerdict Verdict { get; init; }
|
||||
|
||||
[JsonPropertyName("manifest")]
|
||||
public required ScanManifest Manifest { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 SliceInputs
|
||||
|
||||
```csharp
|
||||
public sealed record SliceInputs
|
||||
{
|
||||
public required string GraphDigest { get; init; }
|
||||
public ImmutableArray<string> BinaryDigests { get; init; }
|
||||
public string? SbomDigest { get; init; }
|
||||
public ImmutableArray<string> LayerDigests { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 SliceQuery
|
||||
|
||||
```csharp
|
||||
public sealed record SliceQuery
|
||||
{
|
||||
public string? CveId { get; init; }
|
||||
public ImmutableArray<string> TargetSymbols { get; init; }
|
||||
public ImmutableArray<string> Entrypoints { get; init; }
|
||||
public string? PolicyHash { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 SliceSubgraph, Nodes, Edges
|
||||
|
||||
```csharp
|
||||
public sealed record SliceSubgraph
|
||||
{
|
||||
public ImmutableArray<SliceNode> Nodes { get; init; }
|
||||
public ImmutableArray<SliceEdge> Edges { get; init; }
|
||||
}
|
||||
|
||||
public sealed record SliceNode
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required string Symbol { get; init; }
|
||||
public required SliceNodeKind Kind { get; init; } // entrypoint | intermediate | target | unknown
|
||||
public string? File { get; init; }
|
||||
public int? Line { get; init; }
|
||||
public string? Purl { get; init; }
|
||||
public IReadOnlyDictionary<string, string>? Attributes { get; init; }
|
||||
}
|
||||
|
||||
public sealed record SliceEdge
|
||||
{
|
||||
public required string From { get; init; }
|
||||
public required string To { get; init; }
|
||||
public SliceEdgeKind Kind { get; init; } // direct | plt | iat | dynamic | unknown
|
||||
public double Confidence { get; init; }
|
||||
public string? Evidence { get; init; }
|
||||
public SliceGateInfo? Gate { get; init; }
|
||||
public ObservedEdgeMetadata? Observed { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.5 SliceVerdict
|
||||
|
||||
```csharp
|
||||
public sealed record SliceVerdict
|
||||
{
|
||||
public required SliceVerdictStatus Status { get; init; }
|
||||
public required double Confidence { get; init; }
|
||||
public ImmutableArray<string> Reasons { get; init; }
|
||||
public ImmutableArray<string> PathWitnesses { get; init; }
|
||||
public int UnknownCount { get; init; }
|
||||
public ImmutableArray<GatedPath> GatedPaths { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
`SliceVerdictStatus` values (snake_case):
|
||||
- `reachable`
|
||||
- `unreachable`
|
||||
- `unknown`
|
||||
- `gated`
|
||||
- `observed_reachable`
|
||||
|
||||
### 3.6 ScanManifest
|
||||
|
||||
`ScanManifest` is imported from `StellaOps.Scanner.Core` and includes required fields for reproducibility:
|
||||
|
||||
- `scanId`
|
||||
- `createdAtUtc`
|
||||
- `artifactDigest`
|
||||
- `scannerVersion`
|
||||
- `workerVersion`
|
||||
- `concelierSnapshotHash`
|
||||
- `excititorSnapshotHash`
|
||||
- `latticePolicyHash`
|
||||
- `deterministic`
|
||||
- `seed` (base64-encoded 32-byte seed)
|
||||
- `knobs` (string map)
|
||||
|
||||
`artifactPurl` is optional.
|
||||
|
||||
---
|
||||
|
||||
## 4. Verdict Computation Rules
|
||||
|
||||
```
|
||||
reachable := path_exists AND min(path_confidence) > 0.7 AND unknown_edges == 0
|
||||
unreachable := NOT path_exists AND unknown_edges == 0
|
||||
unknown := otherwise
|
||||
```
|
||||
|
||||
`gated` and `observed_reachable` are reserved for feature-gate and runtime-observed paths (see Sprint 3830 and 3840).
|
||||
|
||||
---
|
||||
|
||||
## 5. Example Slice
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "stellaops.dev/predicates/reachability-slice@v1",
|
||||
"inputs": {
|
||||
"graphDigest": "blake3:a1b2c3d4e5f6789012345678901234567890123456789012345678901234abcd",
|
||||
"binaryDigests": ["sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef"],
|
||||
"sbomDigest": "sha256:cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe"
|
||||
},
|
||||
"query": {
|
||||
"cveId": "CVE-2024-1234",
|
||||
"targetSymbols": ["openssl:EVP_PKEY_decrypt"],
|
||||
"entrypoints": ["main", "http_handler"]
|
||||
},
|
||||
"subgraph": {
|
||||
"nodes": [
|
||||
{"id": "node:1", "symbol": "main", "kind": "entrypoint", "file": "/app/main.c", "line": 42},
|
||||
{"id": "node:2", "symbol": "process_request", "kind": "intermediate", "file": "/app/handler.c", "line": 100},
|
||||
{"id": "node:3", "symbol": "decrypt_data", "kind": "intermediate", "file": "/app/crypto.c", "line": 55},
|
||||
{"id": "node:4", "symbol": "EVP_PKEY_decrypt", "kind": "target", "purl": "pkg:generic/openssl@3.0.0"}
|
||||
],
|
||||
"edges": [
|
||||
{"from": "node:1", "to": "node:2", "kind": "direct", "confidence": 1.0},
|
||||
{"from": "node:2", "to": "node:3", "kind": "direct", "confidence": 0.95},
|
||||
{"from": "node:3", "to": "node:4", "kind": "plt", "confidence": 0.9}
|
||||
]
|
||||
},
|
||||
"verdict": {
|
||||
"status": "reachable",
|
||||
"confidence": 0.9,
|
||||
"reasons": ["path_exists_high_confidence"],
|
||||
"pathWitnesses": ["main -> process_request -> decrypt_data -> EVP_PKEY_decrypt"],
|
||||
"unknownCount": 0
|
||||
},
|
||||
"manifest": {
|
||||
"scanId": "scan-1234",
|
||||
"createdAtUtc": "2025-12-22T10:00:00Z",
|
||||
"artifactDigest": "sha256:00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff",
|
||||
"artifactPurl": "pkg:generic/app@1.0.0",
|
||||
"scannerVersion": "scanner.native:1.2.0",
|
||||
"workerVersion": "scanner.worker:1.2.0",
|
||||
"concelierSnapshotHash": "sha256:1111222233334444555566667777888899990000aaaabbbbccccddddeeeeffff",
|
||||
"excititorSnapshotHash": "sha256:2222333344445555666677778888999900001111aaaabbbbccccddddeeeeffff",
|
||||
"latticePolicyHash": "sha256:3333444455556666777788889999000011112222aaaabbbbccccddddeeeeffff",
|
||||
"deterministic": true,
|
||||
"seed": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
|
||||
"knobs": { "maxDepth": "20" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Determinism Requirements
|
||||
|
||||
For reproducible slices:
|
||||
|
||||
1. **Node ordering**: Sort by `id` (ordinal).
|
||||
2. **Edge ordering**: Sort by `from`, then `to`, then `kind`.
|
||||
3. **Strings**: Trim and de-duplicate lists (`targetSymbols`, `entrypoints`, `reasons`).
|
||||
4. **Timestamps**: Use UTC ISO-8601 with `Z` suffix.
|
||||
5. **JSON serialization**: Canonical JSON (sorted keys, no whitespace).
|
||||
|
||||
---
|
||||
|
||||
## 7. Related Documentation
|
||||
|
||||
- [Binary Reachability Schema](./binary-reachability-schema.md)
|
||||
- [RichGraph Contract](../contracts/richgraph-v1.md)
|
||||
- [Function-Level Evidence](./function-level-evidence.md)
|
||||
- [Replay Verification](./replay-verification.md)
|
||||
|
||||
---
|
||||
|
||||
_Created: 2025-12-22. See Sprint 3810 for implementation details._
|
||||
Reference in New Issue
Block a user