up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
This commit is contained in:
461
docs/reachability/binary-reachability-schema.md
Normal file
461
docs/reachability/binary-reachability-schema.md
Normal file
@@ -0,0 +1,461 @@
|
||||
# Binary Reachability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
|
||||
|
||||
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
|
||||
|
||||
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
|
||||
- **Build variants:** Handling multiple builds from same source
|
||||
- **Large graphs:** Chunking and size limits for DSSE/Rekor
|
||||
- **Offline verification:** Air-gapped attestation workflows
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### BR1: Canonical DSSE/Predicate Schemas
|
||||
|
||||
**Binary graph predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryGraph@v1
|
||||
```
|
||||
|
||||
**Predicate schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "graph",
|
||||
"digest": {"blake3": "a1b2c3d4e5f6..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"analyzer": {
|
||||
"name": "scanner.native",
|
||||
"version": "1.2.0",
|
||||
"toolchain": "ghidra-11.2"
|
||||
},
|
||||
"binary": {
|
||||
"format": "ELF",
|
||||
"arch": "x86_64",
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c..."
|
||||
},
|
||||
"graph_stats": {
|
||||
"node_count": 1247,
|
||||
"edge_count": 3891,
|
||||
"root_count": 5
|
||||
},
|
||||
"evidence": {
|
||||
"symbols_source": "DWARF",
|
||||
"stripped_symbols": 58,
|
||||
"heuristic_symbols": 12
|
||||
},
|
||||
"created_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Edge bundle predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryEdgeBundle@v1
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "edges",
|
||||
"digest": {"sha256": "..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"bundle_id": "bundle:001",
|
||||
"bundle_reason": "init_array",
|
||||
"edge_count": 128,
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:binary:...",
|
||||
"to": "sym:binary:...",
|
||||
"reason": "init-array",
|
||||
"confidence": 0.95
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR2: Edge Hash Recipe
|
||||
|
||||
**Binary edge hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason,
|
||||
"binary_hash": binary.file_hash // Binary context included
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Hash includes binary context:**
|
||||
|
||||
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
|
||||
2. No whitespace, UTF-8 encoding
|
||||
3. Lowercase hex for all hashes
|
||||
|
||||
### BR3: Required Binary Evidence with CAS Refs
|
||||
|
||||
**Required evidence per node:**
|
||||
|
||||
| Evidence Type | Required | CAS Storage |
|
||||
|---------------|----------|-------------|
|
||||
| File hash | Yes | N/A (inline) |
|
||||
| Build ID | Conditional | N/A (inline) |
|
||||
| Symbol source | Yes | N/A (inline) |
|
||||
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
|
||||
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
|
||||
| CFG | Optional | `cas://binary/cfg/{sha256}` |
|
||||
|
||||
**Evidence schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"binary_evidence": {
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_source": "DWARF",
|
||||
"symbol_confidence": 0.95,
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
|
||||
"disassembly_uri": "cas://binary/disasm/sha256:...",
|
||||
"cfg_uri": "cas://binary/cfg/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**CAS layout:**
|
||||
|
||||
```
|
||||
cas://binary/
|
||||
blocks/{sha256}/ # Code block bytes
|
||||
disasm/{sha256}/ # Disassembly JSON
|
||||
cfg/{sha256}/ # Control flow graph
|
||||
symbols/{sha256}/ # Symbol table extract
|
||||
```
|
||||
|
||||
### BR4: Build-ID/Variant Rules
|
||||
|
||||
**Build-ID sources:**
|
||||
|
||||
| Format | Build-ID Source | Example |
|
||||
|--------|-----------------|---------|
|
||||
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
|
||||
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
|
||||
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
|
||||
|
||||
**Fallback when build-ID absent:**
|
||||
|
||||
```json
|
||||
{
|
||||
"build_id": null,
|
||||
"build_id_fallback": {
|
||||
"method": "file_hash",
|
||||
"value": "sha256:...",
|
||||
"confidence": 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variant handling:**
|
||||
|
||||
Multiple binaries from same source (debug/release, different arch):
|
||||
|
||||
```json
|
||||
{
|
||||
"variant_group": "sha256:source_hash...",
|
||||
"variants": [
|
||||
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
|
||||
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
|
||||
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### BR5: Policy Hash Governance
|
||||
|
||||
**Policy version binding:**
|
||||
|
||||
Binary reachability graphs are bound to a policy version:
|
||||
|
||||
```json
|
||||
{
|
||||
"policy_binding": {
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"bound_at": "2025-12-13T10:00:00Z",
|
||||
"binding_mode": "strict"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Binding modes:**
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `strict` | Graph invalid if policy changes |
|
||||
| `forward` | Graph valid with newer policy versions |
|
||||
| `any` | Graph valid with any policy version |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. Production graphs use `strict` binding
|
||||
2. Test graphs may use `forward`
|
||||
3. Policy hash computed from canonical DSL
|
||||
4. Binding stored in graph metadata
|
||||
|
||||
### BR6: Sigstore Bundle/Log Routing
|
||||
|
||||
**Sigstore integration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"bundle_type": "hashedrekord",
|
||||
"log_index": 12345678,
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"inclusion_proof": {
|
||||
"log_index": 12345678,
|
||||
"root_hash": "sha256:...",
|
||||
"tree_size": 98765432,
|
||||
"hashes": ["sha256:...", "sha256:..."]
|
||||
},
|
||||
"signed_entry_timestamp": "base64:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Log routing:**
|
||||
|
||||
| Evidence Type | Log | Notes |
|
||||
|---------------|-----|-------|
|
||||
| Graph DSSE | Rekor (public) | Always |
|
||||
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
|
||||
| Code block | No log | CAS only |
|
||||
| CFG/Disasm | No log | CAS only |
|
||||
|
||||
**Offline mode:**
|
||||
|
||||
When Rekor unavailable:
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"mode": "offline",
|
||||
"checkpoint": {
|
||||
"origin": "rekor.sigstore.dev",
|
||||
"checkpoint_data": "base64:...",
|
||||
"captured_at": "2025-12-13T10:00:00Z"
|
||||
},
|
||||
"deferred_submission": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR7: Idempotent Submission Keys
|
||||
|
||||
**Submission key format:**
|
||||
|
||||
```
|
||||
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
|
||||
```
|
||||
|
||||
**Idempotency rules:**
|
||||
|
||||
1. Same key returns existing entry (no duplicate)
|
||||
2. Key includes hour-granularity timestamp for rate limiting
|
||||
3. Different graphs from same binary produce different keys
|
||||
4. Retry within 1 hour uses same key
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```json
|
||||
{
|
||||
"submission": {
|
||||
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
|
||||
"status": "accepted",
|
||||
"existing_entry": false,
|
||||
"log_index": 12345678
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR8: Size/Chunking Limits
|
||||
|
||||
**Size limits:**
|
||||
|
||||
| Element | Limit | Action on Exceed |
|
||||
|---------|-------|------------------|
|
||||
| Graph JSON | 10 MB | Chunk nodes/edges |
|
||||
| Edge bundle | 512 edges | Split bundles |
|
||||
| DSSE payload | 1 MB | Compress/chunk |
|
||||
| Rekor entry | 100 KB | Reference CAS |
|
||||
|
||||
**Chunking strategy:**
|
||||
|
||||
For large graphs (>10MB):
|
||||
|
||||
```json
|
||||
{
|
||||
"chunked_graph": {
|
||||
"chunk_count": 5,
|
||||
"chunks": [
|
||||
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
|
||||
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
|
||||
],
|
||||
"assembly_order": ["chunk:001", "chunk:002", ...],
|
||||
"assembled_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Compression:**
|
||||
|
||||
- Graph JSON: gzip before DSSE
|
||||
- CAS storage: Raw JSON (indexed)
|
||||
- Rekor payload: DSSE references CAS
|
||||
|
||||
### BR9: API/CLI/UI Surfacing
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `POST` | `/api/binary/graphs` | Submit binary graph |
|
||||
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
|
||||
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
|
||||
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
|
||||
| `POST` | `/api/binary/verify` | Verify graph attestation |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# Submit binary graph
|
||||
stella binary submit --graph ./richgraph.json --binary ./app
|
||||
|
||||
# Get graph info
|
||||
stella binary info --hash blake3:a1b2c3d4...
|
||||
|
||||
# List symbols
|
||||
stella binary symbols --hash blake3:... --stripped-only
|
||||
|
||||
# Verify attestation
|
||||
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
|
||||
```
|
||||
|
||||
**UI components:**
|
||||
|
||||
- Binary graph visualization with zoom/pan
|
||||
- Symbol table with search/filter
|
||||
- Edge explorer with confidence highlighting
|
||||
- Attestation status badges
|
||||
- Build variant selector
|
||||
|
||||
### BR10: Binary Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Binary/
|
||||
fixtures/
|
||||
elf-x86_64-with-debug/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
elf-stripped/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
pe-x64-with-pdb/
|
||||
binary.exe
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
golden/
|
||||
elf-x86_64.golden.json
|
||||
pe-x64.golden.json
|
||||
|
||||
datasets/binary/
|
||||
schema/
|
||||
binary-graph.schema.json
|
||||
binary-edge.schema.json
|
||||
samples/
|
||||
openssl-1.1.1/
|
||||
libssl.so
|
||||
graph.json
|
||||
edges.ndjson
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each binary format has at least one fixture
|
||||
2. Stripped and debug variants for each format
|
||||
3. Expected hashes verified by CI
|
||||
4. Golden outputs include DSSE envelopes
|
||||
5. Fixtures reproducible from source (where legal)
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Hash stability:** Same binary produces same graph hash
|
||||
2. **Build-ID extraction:** Correct build-ID parsing per format
|
||||
3. **Symbol recovery:** DWARF/PDB parsing accuracy
|
||||
4. **Stripped handling:** Code block hash computation
|
||||
5. **Chunking:** Large graph assembly/disassembly
|
||||
6. **DSSE signing:** Envelope creation and verification
|
||||
7. **Rekor integration:** Submission and verification
|
||||
|
||||
---
|
||||
|
||||
## 3. Implementation Status
|
||||
|
||||
| Component | Location | Status |
|
||||
|-----------|----------|--------|
|
||||
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
|
||||
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
|
||||
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
|
||||
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
|
||||
| UI components | `src/UI/StellaOps.UI` | Planned |
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._
|
||||
@@ -1,45 +1,69 @@
|
||||
# Reachability Corpus Plan (QA-CORPUS-401-031)
|
||||
|
||||
Objective
|
||||
- Build a multi-runtime reachability corpus (Go/.NET/Python/Rust) with EXPECT.yaml ground truths and captured traces.
|
||||
- Make fixtures CI-consumable to validate reachability scoring and VEX proofs continuously.
|
||||
- Add public mini-dataset cases (PHP/JavaScript/C#) from advisory 23-Nov-2025 for ingestion/bench reuse.
|
||||
- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
|
||||
- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.
|
||||
|
||||
Scope & deliverables
|
||||
- Fixture layout: `tests/reachability/corpus/<language>/<case>/`
|
||||
- `expect.yaml` — states (`reachable|conditional|unreachable`), score, evidence refs.
|
||||
- `callgraph.*.json` — static graphs per language.
|
||||
- `runtime/*.ndjson` — traces/probes when available.
|
||||
- `sbom.*.json` — CycloneDX/SPDX slices.
|
||||
- `vex.openvex.json` — expected VEX statement.
|
||||
- CI integration: add corpus harness to `tests/reachability/StellaOps.Reachability.FixtureTests` to validate presence, schema, and determinism (hash manifest).
|
||||
- Offline posture: all artifacts deterministic, no external downloads; hashes recorded in manifest.
|
||||
- Public mini-dataset layout (PHP/JS/C#) to be mirrored under `tests/reachability/samples-public/`:
|
||||
```
|
||||
vuln-reach-dataset/
|
||||
schema/ground-truth.schema.json
|
||||
runners/run_all.sh
|
||||
samples/
|
||||
php/php-001-phar-deserialize/...
|
||||
js/js-002-yaml-unsafe-load/...
|
||||
csharp/cs-001-binaryformatter-deserialize/...
|
||||
```
|
||||
Each sample ships: minimal app, lockfile, SBOM (CycloneDX JSON), VEX, ground truth (EXPECT/JSON), repro script.
|
||||
## Corpus Map
|
||||
|
||||
MVP slice (proposed)
|
||||
### 1) Multi-runtime corpus (internal MVP)
|
||||
|
||||
Path: `tests/reachability/corpus/`
|
||||
|
||||
Per-case layout: `tests/reachability/corpus/<language>/<case>/`
|
||||
- `callgraph.static.json` — static call graph sample (stub for MVP).
|
||||
- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
|
||||
- `vex.openvex.json` — expected VEX slice for the case.
|
||||
- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
|
||||
|
||||
`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
|
||||
|
||||
### 2) Public mini dataset (PHP/JS/C#)
|
||||
|
||||
Path: `tests/reachability/samples-public/`
|
||||
|
||||
Layout:
|
||||
- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
|
||||
- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
|
||||
- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
|
||||
- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
|
||||
|
||||
### 3) Reachbench fixture pack (expanded, dual variants)
|
||||
|
||||
Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
|
||||
|
||||
Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
|
||||
|
||||
## Ground Truth Conventions
|
||||
|
||||
- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
|
||||
- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
|
||||
- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
|
||||
|
||||
## Determinism & Runners
|
||||
|
||||
Regenerate all reachability manifests (corpus + public samples + reachbench pack):
|
||||
- `tests/reachability/runners/run_all.sh`
|
||||
- `tests/reachability/runners/run_all.ps1`
|
||||
|
||||
Individual scripts:
|
||||
- `python tests/reachability/scripts/update_corpus_manifest.py`
|
||||
- `python tests/reachability/samples-public/scripts/update_manifest.py`
|
||||
- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
|
||||
|
||||
## CI Gates
|
||||
|
||||
- `tests/reachability/StellaOps.Reachability.FixtureTests`
|
||||
- validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
|
||||
- enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
|
||||
|
||||
## MVP Slice (stub cases)
|
||||
- Go: `go-ssh-CVE-2020-9283-keyexchange`
|
||||
- .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
|
||||
- Python: `python-django-CVE-2019-19844-sqli-like`
|
||||
- Rust: `rust-axum-header-parsing-TBD`
|
||||
|
||||
Work plan
|
||||
1) Define shared manifest schema + hash manifest (NDJSON) under `tests/reachability/corpus/manifest.json`.
|
||||
2) For each MVP case, add minimal static callgraph + EXPECT.yaml with score/state and evidence links. (DONE: stub versions committed)
|
||||
3) Extend reachability fixture tests to cover corpus folders (presence, hashes, EXPECT.yaml schema). (DONE)
|
||||
4) Wire CI job to run the extended tests in `tests/reachability/StellaOps.Reachability.FixtureTests`. (TODO)
|
||||
5) Replace stubs with real callgraphs/traces and expand corpus after MVP passes CI. (TODO)
|
||||
## Next Work (post-MVP)
|
||||
- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
|
||||
- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.
|
||||
|
||||
Determinism rules
|
||||
- Sort JSON keys; round scores to 2dp; UTC times only if needed.
|
||||
- Stable ordering of files in manifests; hash with SHA-256.
|
||||
- No network calls during test or generation.
|
||||
|
||||
416
docs/reachability/edge-explainability-schema.md
Normal file
416
docs/reachability/edge-explainability-schema.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Edge Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
|
||||
|
||||
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
|
||||
|
||||
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
|
||||
- **Confidence score:** Certainty of the edge's existence
|
||||
- **Evidence sources:** Detectors and rules that contributed to edge discovery
|
||||
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EG1: Reason Enum Governance
|
||||
|
||||
**Standard reason codes:**
|
||||
|
||||
| Code | Category | Description | Example |
|
||||
|------|----------|-------------|---------|
|
||||
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
|
||||
| `bytecode-field` | Static | Field access leading to call | Static initializer |
|
||||
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
|
||||
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
|
||||
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
|
||||
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
|
||||
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
|
||||
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
|
||||
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
|
||||
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
|
||||
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
|
||||
| `user-annotated` | Manual | User-provided edge | Policy override |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. New reason codes require RFC + review by Scanner Guild
|
||||
2. Deprecated codes remain valid for 2 major versions
|
||||
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
|
||||
4. Codes are case-insensitive, normalized to lowercase
|
||||
|
||||
**Code registry:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.edge.reason.registry@v1",
|
||||
"version": "2025-12-13",
|
||||
"reasons": [
|
||||
{
|
||||
"code": "bytecode-invoke",
|
||||
"category": "static",
|
||||
"description": "Bytecode invocation instruction",
|
||||
"languages": ["java", "dotnet"],
|
||||
"confidence_range": [0.9, 1.0],
|
||||
"deprecated": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### EG2: Canonical Edge Schema with Hash Rules
|
||||
|
||||
**Edge schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"edge_id": "edge:sha256:{hex}",
|
||||
"from": "sym:java:...",
|
||||
"to": "sym:java:...",
|
||||
"kind": "call",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95,
|
||||
"evidence": [
|
||||
{
|
||||
"source": "detector:java-bytecode-analyzer",
|
||||
"rule_id": "invoke-virtual",
|
||||
"rule_version": "1.0.0",
|
||||
"location": {
|
||||
"file": "com/example/Foo.class",
|
||||
"offset": 1234,
|
||||
"instruction": "invokevirtual #42"
|
||||
},
|
||||
"timestamp": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"virtual": true,
|
||||
"polymorphic_targets": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
|
||||
2. Sort JSON keys alphabetically
|
||||
3. No whitespace, UTF-8 encoding
|
||||
4. Hash is lowercase hex with `sha256:` prefix
|
||||
|
||||
### EG3: Evidence Limits/Redaction
|
||||
|
||||
**Evidence limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Evidence entries per edge | 10 | Yes |
|
||||
| Location detail fields | 5 | Yes |
|
||||
| Instruction preview length | 100 chars | Yes |
|
||||
| File path depth | 10 segments | No |
|
||||
|
||||
**Redaction rules:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
|
||||
| Bytecode offsets | Keep | Offsets are not PII |
|
||||
| Instruction text | Truncate | First 100 chars |
|
||||
| Source line content | Omit | Not included by default |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence_truncated": true,
|
||||
"evidence_count": 15,
|
||||
"evidence_shown": 10,
|
||||
"full_evidence_uri": "cas://edges/evidence/sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### EG4: Confidence Rubric
|
||||
|
||||
**Confidence scale:**
|
||||
|
||||
| Level | Range | Description | Typical Sources |
|
||||
|-------|-------|-------------|-----------------|
|
||||
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
|
||||
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
|
||||
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
|
||||
| `low` | 0.2-0.49 | Possible | Heuristic carving |
|
||||
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
|
||||
|
||||
**Confidence computation:**
|
||||
|
||||
```
|
||||
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
|
||||
```
|
||||
|
||||
**Base confidence by reason:**
|
||||
|
||||
| Reason | Base Confidence |
|
||||
|--------|-----------------|
|
||||
| `bytecode-invoke` | 0.98 |
|
||||
| `import-symbol` | 0.95 |
|
||||
| `plt-stub` | 0.92 |
|
||||
| `reloc-target` | 0.90 |
|
||||
| `init-array` | 0.95 |
|
||||
| `vtable-slot` | 0.75 |
|
||||
| `indirect-target` | 0.60 |
|
||||
| `reflection-invoke` | 0.50 |
|
||||
| `runtime-observed` | 0.99 |
|
||||
| `user-annotated` | 0.80 |
|
||||
|
||||
### EG5: Detector/Rule Provenance
|
||||
|
||||
**Provenance schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"provenance": {
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"detector": {
|
||||
"name": "java-bytecode-analyzer",
|
||||
"version": "2.0.0",
|
||||
"rule_set": "default"
|
||||
},
|
||||
"rule": {
|
||||
"id": "invoke-virtual",
|
||||
"version": "1.0.0",
|
||||
"description": "Detect invokevirtual bytecode instructions"
|
||||
},
|
||||
"input_artifacts": [
|
||||
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
|
||||
],
|
||||
"detected_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Provenance requirements:**
|
||||
|
||||
1. All edges must include analyzer provenance
|
||||
2. Detector/rule provenance required for non-runtime edges
|
||||
3. Input artifact digests enable reproducibility
|
||||
4. Detection timestamp uses UTC ISO-8601
|
||||
|
||||
### EG6: API/CLI Parity
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/api/edges/{edgeId}` | Get edge details |
|
||||
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
|
||||
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
|
||||
| `POST` | `/api/edges/search` | Search edges by criteria |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List edges for a graph
|
||||
stella edge list --graph blake3:a1b2c3d4...
|
||||
|
||||
# Get edge details
|
||||
stella edge show --id edge:sha256:...
|
||||
|
||||
# Search edges
|
||||
stella edge search --from "sym:java:..." --reason bytecode-invoke
|
||||
|
||||
# Export edges
|
||||
stella edge export --graph blake3:... --output ./edges.ndjson
|
||||
```
|
||||
|
||||
**Output parity:**
|
||||
|
||||
- API and CLI return identical JSON structure
|
||||
- CLI supports `--json` for machine-readable output
|
||||
- Both support filtering by reason, confidence, from/to
|
||||
|
||||
### EG7: Deterministic Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Edge/
|
||||
fixtures/
|
||||
bytecode-invoke.json
|
||||
plt-stub.json
|
||||
vtable-dispatch.json
|
||||
init-array-constructor.json
|
||||
runtime-observed.json
|
||||
golden/
|
||||
bytecode-invoke.golden.json
|
||||
graph-with-edges.golden.json
|
||||
|
||||
datasets/edges/
|
||||
schema/
|
||||
edge.schema.json
|
||||
reason-registry.json
|
||||
samples/
|
||||
java-spring-boot/
|
||||
edges.ndjson
|
||||
expected-hashes.txt
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each reason code has at least one fixture
|
||||
2. Fixtures include expected `edge_id` hash
|
||||
3. Golden outputs frozen after review
|
||||
4. CI verifies hash stability
|
||||
|
||||
### EG8: Propagation into Explanation Graphs/VEX
|
||||
|
||||
**Explanation graph inclusion:**
|
||||
|
||||
```json
|
||||
{
|
||||
"explanation": {
|
||||
"path": [
|
||||
{
|
||||
"node": "sym:java:main...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:handler...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.98
|
||||
}
|
||||
},
|
||||
{
|
||||
"node": "sym:java:handler...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:log4j...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
}
|
||||
}
|
||||
],
|
||||
"aggregate_path_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**VEX evidence format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"stellaops:reachability": {
|
||||
"path_edges": [
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
|
||||
],
|
||||
"weakest_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"aggregate_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EG9: Localization Guidance
|
||||
|
||||
**Localizable elements:**
|
||||
|
||||
| Element | Localization | Example |
|
||||
|---------|--------------|---------|
|
||||
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
|
||||
| Confidence level | Message catalog | `high` -> "High confidence" |
|
||||
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
|
||||
| Error messages | Message catalog | Standard error codes |
|
||||
|
||||
**Message catalog structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"locale": "en-US",
|
||||
"messages": {
|
||||
"edge.reason.bytecode-invoke": "Bytecode method call",
|
||||
"edge.reason.plt-stub": "PLT/GOT library call",
|
||||
"edge.confidence.high": "High confidence ({0:P0})",
|
||||
"edge.evidence.location": "Detected at offset {offset} in {file}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Supported locales:**
|
||||
|
||||
- `en-US` (default)
|
||||
- Additional locales via contribution
|
||||
|
||||
### EG10: Backfill Plan
|
||||
|
||||
**Backfill strategy:**
|
||||
|
||||
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
|
||||
2. **Phase 2:** Run detector upgrade on graphs without reason codes
|
||||
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
|
||||
|
||||
**Migration script:**
|
||||
|
||||
```bash
|
||||
stella edge backfill --graph blake3:... --dry-run
|
||||
|
||||
# Output:
|
||||
Graph: blake3:a1b2c3d4...
|
||||
Edges without reason: 1234
|
||||
Edges to update: 1234
|
||||
|
||||
Dry run - no changes made.
|
||||
|
||||
# Execute:
|
||||
stella edge backfill --graph blake3:... --execute
|
||||
```
|
||||
|
||||
**Backfill metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"backfill": {
|
||||
"status": "complete",
|
||||
"original_analyzer_version": "1.0.0",
|
||||
"backfill_analyzer_version": "1.2.0",
|
||||
"backfilled_at": "2025-12-13T10:00:00Z",
|
||||
"edges_updated": 1234
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Explainability Schema](./explainability-schema.md) - Explanation format
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._
|
||||
454
docs/reachability/explainability-schema.md
Normal file
454
docs/reachability/explainability-schema.md
Normal file
@@ -0,0 +1,454 @@
|
||||
# Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._
|
||||
|
||||
This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:
|
||||
|
||||
- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
|
||||
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
|
||||
- **Confidence scores:** Per-rule and aggregate confidence values
|
||||
- **Redaction metadata:** PII handling and data classification
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EX1: Schema/Canonicalization + Hashes
|
||||
|
||||
**Explanation schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation@v1",
|
||||
"explanation_id": "explain:sha256:{hex}",
|
||||
"finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"verdict": {
|
||||
"status": "affected",
|
||||
"severity": {"normalized": "Critical", "score": 10.0},
|
||||
"confidence": 0.92
|
||||
},
|
||||
"decision_chain": [
|
||||
{
|
||||
"rule_id": "rule:reachability_gate",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"reachability.state": "CR",
|
||||
"reachability.confidence": 0.92
|
||||
},
|
||||
"output": {"allowed": true, "contribution": 0.4},
|
||||
"evidence_refs": ["cas://reachability/graphs/blake3:..."]
|
||||
},
|
||||
{
|
||||
"rule_id": "rule:severity_baseline",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"cvss_base": 10.0,
|
||||
"epss_percentile": 0.95
|
||||
},
|
||||
"output": {"severity": "Critical", "contribution": 0.6},
|
||||
"evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
|
||||
}
|
||||
],
|
||||
"aggregate_confidence": 0.88,
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"policy_version": "sha256:...",
|
||||
"graph_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
**Canonicalization rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all levels
|
||||
2. Arrays in `decision_chain` ordered by rule execution sequence
|
||||
3. `evidence_refs` arrays sorted alphabetically
|
||||
4. No whitespace, UTF-8 encoding
|
||||
5. Hash computed over canonical JSON: `sha256(canonical_json)`
|
||||
|
||||
### EX2: DSSE Predicate/Signing Policy
|
||||
|
||||
**DSSE predicate type:**
|
||||
|
||||
```
|
||||
stella.ops/explanation@v1
|
||||
```
|
||||
|
||||
**Signing policy:**
|
||||
|
||||
| Element | Required | Signer |
|
||||
|---------|----------|--------|
|
||||
| Explanation body | Yes | Policy Engine key |
|
||||
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
|
||||
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |
|
||||
|
||||
**DSSE envelope structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.explanation+json",
|
||||
"payload": "<base64(canonical_explanation_json)>",
|
||||
"signatures": [
|
||||
{
|
||||
"keyid": "policy-engine-signing-2025",
|
||||
"sig": "base64:..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Signing requirements:**
|
||||
|
||||
- All explanations must be signed before CAS storage
|
||||
- Signing key must be registered in Authority key store
|
||||
- Key rotation triggers re-signing of active explanations (configurable)
|
||||
|
||||
### EX3: CAS Storage Rules for Evidence
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://explanations/
|
||||
{sha256}/ # Explanation body
|
||||
{sha256}.dsse # DSSE envelope
|
||||
by-finding/{finding_id}/ # Index by finding
|
||||
by-policy/{policy_digest}/ # Index by policy version
|
||||
by-graph/{graph_revision_id}/ # Index by graph revision
|
||||
```
|
||||
|
||||
**Storage rules:**
|
||||
|
||||
1. Explanations are immutable after signing
|
||||
2. New verdicts create new explanation documents (no updates)
|
||||
3. Previous explanations are retained per retention policy
|
||||
4. Cross-references validated at write time (graphs, VEX must exist)
|
||||
|
||||
**Deduplication:**
|
||||
|
||||
- Identical canonical JSON produces identical hash
|
||||
- CAS returns existing reference if content matches
|
||||
|
||||
### EX4: Link to Decision/Policy and graph_revision_id
|
||||
|
||||
**Required links:**
|
||||
|
||||
```json
|
||||
{
|
||||
"links": {
|
||||
"policy_version": "sha256:7e1d...",
|
||||
"policy_uri": "cas://policy/versions/sha256:7e1d...",
|
||||
"graph_revision_id": "rev:blake3:a1b2...",
|
||||
"graph_uri": "cas://reachability/revisions/blake3:a1b2...",
|
||||
"sbom_digest": "sha256:def4...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"vex_digest": "sha256:e5f6...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Validation:**
|
||||
|
||||
- All linked artifacts must exist at explanation creation time
|
||||
- Links are verified during replay/audit
|
||||
- Broken links cause replay verification failure
|
||||
|
||||
### EX5: Export/Replay Bundle Format
|
||||
|
||||
**Export bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation.bundle@v1",
|
||||
"bundle_id": "bundle:explain:2025-12-13",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"explanations": [
|
||||
{
|
||||
"explanation_id": "explain:sha256:...",
|
||||
"finding_id": "...",
|
||||
"explanation_uri": "explanations/sha256:....json",
|
||||
"dsse_uri": "explanations/sha256:....dsse"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
"graphs": [
|
||||
{"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
|
||||
],
|
||||
"policies": [
|
||||
{"digest": "sha256:...", "uri": "policies/sha256:....json"}
|
||||
],
|
||||
"vex_statements": [
|
||||
{"digest": "sha256:...", "uri": "vex/sha256:....json"}
|
||||
]
|
||||
},
|
||||
"verification": {
|
||||
"bundle_hash": "sha256:...",
|
||||
"signature": "base64:...",
|
||||
"signed_by": "policy-engine-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Replay verification:**
|
||||
|
||||
```bash
|
||||
stella explain verify --bundle ./explanation-bundle.tgz
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:explain:2025-12-13
|
||||
Explanations: 42
|
||||
Dependencies: 5 graphs, 2 policies, 12 VEX
|
||||
|
||||
Verifying explanations...
|
||||
Canonical hashes: 42/42 MATCH
|
||||
DSSE signatures: 42/42 VALID
|
||||
Dependency links: 42/42 RESOLVED
|
||||
|
||||
Replay verification PASSED.
|
||||
```
|
||||
|
||||
### EX6: PII/Redaction Rules
|
||||
|
||||
**Redaction categories:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
|
||||
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
|
||||
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
|
||||
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
|
||||
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |
|
||||
|
||||
**Redaction metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"redaction": {
|
||||
"applied": true,
|
||||
"level": "standard",
|
||||
"fields_redacted": ["actor.email", "evidence.file_path"],
|
||||
"redaction_policy": "stellaops.redaction.standard@v1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Export modes:**
|
||||
|
||||
- `--redacted` (default): Apply standard redaction
|
||||
- `--full`: Include all data (requires `explain:export:full` scope)
|
||||
- `--audit`: Include redaction audit trail
|
||||
|
||||
### EX7: Size Budgets
|
||||
|
||||
**Limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Explanation body | 256 KB | Yes |
|
||||
| Decision chain entries | 100 | Yes |
|
||||
| Evidence refs per rule | 20 | Yes |
|
||||
| Total evidence refs | 200 | Yes |
|
||||
| Path entries | 50 | No |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
When limits are exceeded:
|
||||
1. Log warning with truncation details
|
||||
2. Add `truncation` metadata to explanation
|
||||
3. Store full evidence in separate CAS object
|
||||
4. Include `full_evidence_uri` reference
|
||||
|
||||
```json
|
||||
{
|
||||
"truncation": {
|
||||
"applied": true,
|
||||
"elements_truncated": ["decision_chain", "evidence_refs"],
|
||||
"full_evidence_uri": "cas://explanations/full/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EX8: Versioning
|
||||
|
||||
**Schema versioning:**
|
||||
|
||||
- Schema version in `schema` field: `stellaops.explanation@v1`
|
||||
- Breaking changes increment major version
|
||||
- Minor changes (additive fields) use v1.x
|
||||
- Backward compatibility maintained for 2 major versions
|
||||
|
||||
**Migration support:**
|
||||
|
||||
```bash
|
||||
stella explain migrate --from v1 --to v2 --input ./explanations/
|
||||
|
||||
# Output:
|
||||
Migrating 1000 explanations from v1 to v2...
|
||||
Migrated: 998
|
||||
Skipped (already v2): 2
|
||||
|
||||
Migration complete.
|
||||
```
|
||||
|
||||
**Version compatibility matrix:**
|
||||
|
||||
| API Version | Schema v1 | Schema v2 |
|
||||
|-------------|-----------|-----------|
|
||||
| 1.0.x | Full | N/A |
|
||||
| 1.1.x | Full | Full |
|
||||
| 2.0.x | Read-only | Full |
|
||||
|
||||
### EX9: Golden Fixtures/Tests
|
||||
|
||||
**Test fixture location:**
|
||||
|
||||
```
|
||||
tests/Explanation/
|
||||
fixtures/
|
||||
simple-affected.json
|
||||
simple-not-affected.json
|
||||
with-reachability-evidence.json
|
||||
multi-rule-chain.json
|
||||
truncated-evidence.json
|
||||
redacted-pii.json
|
||||
golden/
|
||||
simple-affected.golden.json
|
||||
simple-affected.golden.dsse
|
||||
|
||||
datasets/explanations/
|
||||
schema/
|
||||
explanation.schema.json
|
||||
samples/
|
||||
log4j-affected/
|
||||
explanation.json
|
||||
expected-hash.txt
|
||||
```
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Canonicalization tests:** Verify hash stability across JSON reordering
|
||||
2. **DSSE signing tests:** Verify signature creation and verification
|
||||
3. **Redaction tests:** Verify PII handling
|
||||
4. **Truncation tests:** Verify size budget enforcement
|
||||
5. **Replay tests:** Verify bundle export/import cycle
|
||||
6. **Migration tests:** Verify version upgrade paths
|
||||
|
||||
**CI integration:**
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/explanation-tests.yml
|
||||
explanation-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Run explanation tests
|
||||
run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
|
||||
- name: Verify golden fixtures
|
||||
run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
|
||||
```
|
||||
|
||||
### EX10: Determinism Guarantees
|
||||
|
||||
**Determinism requirements:**
|
||||
|
||||
1. Same inputs produce identical `explanation_id` hash
|
||||
2. Decision chain ordering is stable (execution order)
|
||||
3. Evidence refs sorted alphabetically
|
||||
4. Timestamps use UTC ISO-8601 with millisecond precision
|
||||
5. Floating-point values rounded to 6 decimal places
|
||||
|
||||
**Verification:**
|
||||
|
||||
```bash
|
||||
# Run twice with same inputs, verify identical hashes
|
||||
stella explain generate --finding "..." --output a.json
|
||||
stella explain generate --finding "..." --output b.json
|
||||
diff a.json b.json # Should be empty
|
||||
|
||||
# Or use built-in verify
|
||||
stella explain verify-determinism --finding "..." --iterations 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Generate Explanation
|
||||
|
||||
```http
|
||||
POST /api/policy/findings/{findingId}/explain
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"mode": "full",
|
||||
"include_evidence": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Explanation
|
||||
|
||||
```http
|
||||
GET /api/explanations/{explanationId}
|
||||
Authorization: Bearer <token>
|
||||
Accept: application/json
|
||||
```
|
||||
|
||||
### 3.3 Export Explanation Bundle
|
||||
|
||||
```http
|
||||
POST /api/explanations/export
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"finding_ids": ["...", "..."],
|
||||
"include_dependencies": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Verify Explanation
|
||||
|
||||
```http
|
||||
POST /api/explanations/{explanationId}/verify
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. CLI Reference
|
||||
|
||||
```bash
|
||||
# Generate explanation for a finding
|
||||
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"
|
||||
|
||||
# Export explanation bundle
|
||||
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz
|
||||
|
||||
# Verify explanation
|
||||
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse
|
||||
|
||||
# Verify bundle
|
||||
stella explain verify --bundle ./bundle.tgz
|
||||
|
||||
# Check determinism
|
||||
stella explain verify-determinism --finding "..." --iterations 5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Related Documentation
|
||||
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
|
||||
- [Policy API](../api/policy.md) - Policy Engine REST API
|
||||
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._
|
||||
@@ -1,175 +1,535 @@
|
||||
# Function-Level Evidence Readiness (Nov 2025 Advisory)
|
||||
# Function-Level Evidence Guide
|
||||
|
||||
_Last updated: 2025-11-12. Owner: Business Analysis Guild._
|
||||
_Last updated: 2025-12-13. Owner: Docs Guild._
|
||||
|
||||
This memo captures the outstanding work required to make Stella Ops scanners emit stable, function-level evidence that matches the November 2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.
|
||||
This guide documents the cross-module function-level evidence chain that enables provable reachability claims. It covers the schema, identifiers, API usage, CLI commands, and integration patterns for Scanner, Signals, Policy, and Replay.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal & Scope
|
||||
## 1. Overview
|
||||
|
||||
**Goal.** Anchor every vulnerability finding to an immutable `{artifact_digest, code_id}` tuple plus optional symbol hints so replayers can prove reachability against stripped binaries.
|
||||
StellaOps implements a **function-level evidence chain** that anchors every vulnerability finding to immutable identifiers (`code_id`, `symbol_id`, `graph_hash`) enabling:
|
||||
|
||||
**Scope.** Scanner analyzers, runtime ingestion, Signals scoring, Replay manifests, Policy/VEX emission, CLI/UI explainers, and documentation/runbooks needed to operationalise the advisory.
|
||||
- **Provable reachability:** Deterministic call-path evidence from entry points to vulnerable functions.
|
||||
- **Stripped binary support:** `code_id` + `code_block_hash` provides identity when symbols are absent.
|
||||
- **Evidence replay:** Sealed artifacts with DSSE attestation allow offline verification.
|
||||
- **Cross-module linking:** Scanner -> Signals -> Policy -> VEX -> UI/CLI evidence chain.
|
||||
|
||||
Out of scope: implementing disassemblers or symbol servers; those will be handled inside the module-specific backlog tasks referenced below.
|
||||
### 1.1 Core Identifiers
|
||||
|
||||
| Identifier | Format | Purpose | Example |
|
||||
|------------|--------|---------|---------|
|
||||
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity | `sym:java:R3JlZXRpbmc...` |
|
||||
| `code_id` | `code:{lang}:{base64url}` | Identity for name-less code blocks | `code:binary:YWJjZGVm...` |
|
||||
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity | `blake3:a1b2c3d4e5f6...` |
|
||||
| `symbol_digest` | `sha256:{hex}` | Hash of symbol_id for edge linking | `sha256:e5f6a7b8c9d0...` |
|
||||
| `build_id` | `gnu-build-id:{hex}` | ELF/PE debug identifier | `gnu-build-id:5f0c7c3c...` |
|
||||
|
||||
### 1.2 Evidence Chain Flow
|
||||
|
||||
```
|
||||
Scanner -> richgraph-v1 -> Signals -> Scoring -> Policy -> VEX -> UI/CLI
|
||||
| | | | | | |
|
||||
| | | | | | +-- stella graph explain
|
||||
| | | | | +-- OpenVEX with call-path proofs
|
||||
| | | | +-- Policy gates + reachability.state
|
||||
| | | +-- Lattice state + confidence + riskScore
|
||||
| | +-- Runtime facts + static paths
|
||||
| +-- BLAKE3 graph_hash + DSSE attestation
|
||||
+-- code_id, symbol_id, build_id per node
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Advisory Requirements vs. System Gaps
|
||||
## 2. Schema Reference
|
||||
|
||||
| Requirement | Current gap | Task references | Notes |
|
||||
|-------------|-------------|-----------------|-------|
|
||||
| Immutable code identity (`code_id` = `{format, build_id, start, length}` + optional `code_block_hash`) | Callgraph nodes are opaque strings with no address metadata. | Sprint 401 `GRAPH-CAS-401-001`, `GAP-SCAN-001`, `GAP-SYM-007` | `code_id` should live alongside existing `SymbolID` helpers so analyzers can emit it without duplicating logic. |
|
||||
| Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | `GAP-SYM-007` | Require deterministic casing + `symbol.source ∈ {DWARF,PDB,SYM,none}`. |
|
||||
| Runtime facts mapped to code anchors | `/signals/runtime-facts` now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. | Sprint 400 `ZASTAVA-REACH-201-001`, Sprint 401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. |
|
||||
| Replay/DSSE coverage | Replay manifests don’t enforce hash/CAS registration for graphs/traces. | Sprint 400 `REPLAY-REACH-201-005`, Sprint 401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
|
||||
| Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint 401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
|
||||
| Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint 401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
|
||||
| Build-id propagation | Build-id not consistently captured or threaded into `SymbolID`/`code_id`; SBOM/runtime joins are brittle. | Sprint 401 `SCANNER-BUILDID-401-035` | Capture `.note.gnu.build-id`, include in code identity, expose in SBOM exports and runtime events. |
|
||||
| Load-time constructors as roots | Graph roots omit `.preinit_array`/`.init_array`/`_init`, missing load-time edges. | Sprint 401 `SCANNER-INITROOT-401-036` | Add synthetic roots with `phase=load`; include `DT_NEEDED` deps’ constructors. |
|
||||
| PURL-resolved edges | Call edges do not carry `purl` or `symbol_digest`, slowing SBOM joins. | Sprint 401 `GRAPH-PURL-401-034` | Annotate edges per `docs/reachability/purl-resolved-edges.md`; keep deterministic graph hash. |
|
||||
| Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint 0400 `SIGNALS-UNKNOWN-201-008` | Emit Unknowns records (see `docs/signals/unknowns-registry.md`) and feed `unknowns_pressure` into scoring. |
|
||||
| Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint 401 `QA-PORACLE-401-037` | Add paired vuln/fixed fixtures and expectations; wire to CI using `docs/reachability/patch-oracles.md`. |
|
||||
### 2.1 SymbolID Construction
|
||||
|
||||
---
|
||||
Per-language canonical tuple format (NUL-separated, then SHA-256 -> base64url):
|
||||
|
||||
## 3. Workstreams & Expectations
|
||||
| Language | Tuple Components | Example |
|
||||
|----------|------------------|---------|
|
||||
| Java | `{package}\0{class}\0{method}\0{descriptor}` | `com.example\0Foo\0bar\0(Ljava/lang/String;)V` |
|
||||
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | `MyApp\0Controllers\0UserController\0GetById(int)` |
|
||||
| Go | `{module}\0{package}\0{receiver}\0{func}` | `github.com/user/repo\0handler\0*Server\0Handle` |
|
||||
| Node | `{pkg_or_path}\0{export_path}\0{kind}` | `lodash\0get\0function` |
|
||||
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | `sha256:abc...\0.text\00x401000\0ssl3_read\0global\0` |
|
||||
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` | `requests\0api\0get` |
|
||||
| Ruby | `{gem_or_path}\0{module}\0{method}` | `rails\0ActionController::Base\0render` |
|
||||
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | `symfony/http-kernel\0Kernel\0handle` |
|
||||
|
||||
### 3.1 Scanner Symbolization (GAP-SCAN-001 / GAP-SYM-007)
|
||||
### 2.2 CodeID Construction
|
||||
|
||||
* Define `SymbolID` helpers that glue together `{artifact_digest, file`, optional `section`, `addr`, `length`, `code_block_hash`}.
|
||||
* Update analyzer contracts so every analyzer returns both `symbol_id` and `code_id`, with demangled names stored under the new `symbol` block.
|
||||
* Persist the data into `richgraph-v1` payloads and attach CAS URIs via `StellaOps.Scanner.Reachability`.
|
||||
* Deliver fixtures in `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` that prove determinism (same hash when analyzer flags reorder).
|
||||
* **Helper status (2025-12-02):** `SymbolId.ForBinaryAddressed` + `CodeId.ForBinarySegment` now encode `{file_hash, section, addr, name, linkage, length, code_block_hash}` with normalized hex addresses. Analyzers should start emitting these tuples instead of ad-hoc hashes.
|
||||
* **Binary lifter (2025-12-03):** `BinaryReachabilityLifter` emits richgraph nodes for ELF/PE/Mach-O using file SHA-256 + section/address tuples, attaches `code_id` anchors, and turns imports/load commands into `import` edges.
|
||||
* **Schema wiring (2025-12-12):** `reachability-union` + `richgraph-v1` serializers now emit `symbol {mangled,demangled,source,confidence}` and optional `code_block_hash` for stripped blocks; confidence is clamped to `[0,1]` and `source` normalized to uppercase (`DWARF|PDB|SYM|NONE`).
|
||||
For stripped binaries or name-less code blocks:
|
||||
|
||||
### 3.2 Runtime + Signals (GAP-ZAS-002 / GAP-SIG-003)
|
||||
```
|
||||
code:{lang}:{base64url_sha256(format + file_hash + addr + length + section + code_block_hash)}
|
||||
```
|
||||
|
||||
* Extend Zastava Observer NDJSON schema to emit: `symbol_id`, `code_id`, `hit_count`, `observed_at`, `loader_base`, `process.buildId`.
|
||||
* Implement `/signals/runtime-facts` ingestion (gzip + NDJSON) with CAS-backed storage under `cas://reachability/runtime/{sha256}`.
|
||||
* Update `ReachabilityScoringService` to lattice states and include runtime evidence references plus CAS URIs in `ReachabilityFactDocument.Metadata`.
|
||||
Example for stripped ELF:
|
||||
```
|
||||
code:binary:YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo
|
||||
```
|
||||
|
||||
### 3.3 Replay & Evidence (GAP-REP-004)
|
||||
### 2.3 Graph Node Schema
|
||||
|
||||
* Enforce CAS registration + BLAKE3 hashing before manifest writes (graphs and traces).
|
||||
* Teach `ReachabilityReplayWriter` to require analyzer name/version, graph kind, `code_id` coverage summary.
|
||||
* Update `docs/replay/DETERMINISTIC_REPLAY.md` once schema v2 is finalized.
|
||||
|
||||
### 3.4 Policy, VEX, CLI/UI (GAP-POL-005 / GAP-VEX-006)
|
||||
|
||||
* Policy Engine: ingest new reachability facts, expose `reachability.state`, `max_path_conf`, and `evidence.graph_hash` via SPL + API.
|
||||
* CLI/UI: add `stella graph explain` and explain drawer showing call path (`SymbolID` list), code anchors, runtime hits, DSSE references.
|
||||
* Notify templates: include short evidence summary (first hop + truncated `code_id`).
|
||||
|
||||
### 3.5 Documentation & Samples (GAP-DOC-008)
|
||||
|
||||
* Publish schema diffs in `docs/data/evidence-schema.md` (new file) covering SBOM evidence nodes, runtime NDJSON, and API responses.
|
||||
* Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains.
|
||||
* Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays.
|
||||
|
||||
### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
|
||||
|
||||
* Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that:
|
||||
* parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
|
||||
* emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and
|
||||
* attach analyzer + toolchain identifiers consumed by `richgraph-v1`.
|
||||
* Introduce `Reachability.Store` collections in Mongo:
|
||||
* `func_nodes` – keyed by `func:<format>:<sha256>:<va>` with `{binDigest,name,addr,size,lang,confidence,sym}`.
|
||||
* `call_edges` – `{from,to,kind,confidence,evidence[]}` linking internal/external nodes.
|
||||
* `cve_func_hits` – `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment.
|
||||
* Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
|
||||
|
||||
---
|
||||
|
||||
## 4. Schema & API Touchpoints
|
||||
|
||||
Authoritative field list lives in `docs/reachability/evidence-schema.md`; use it for DTOs and CAS writers.
|
||||
|
||||
The next implementation pass must cover the following documents/files (create them if missing):
|
||||
|
||||
1. `docs/data/evidence-schema.md` – authoritative schema for `{code_id, symbol, tool}` blocks.
|
||||
2. `docs/runbooks/reachability-runtime.md` – operator steps for staging runtime ingestion bundles, retention, and troubleshooting.
|
||||
3. `docs/runbooks/replay_ops.md` – add section detailing replay verification using the new graph/runtime CAS entries.
|
||||
|
||||
API contracts to amend:
|
||||
|
||||
- `POST /signals/callgraphs` response includes `graphHash` (sha256) for the normalized callgraph; richgraph-v1 uses BLAKE3 for graph CAS hashes.
|
||||
- `POST /signals/runtime-facts` request body schema (NDJSON) with `symbol_id`, `code_id`, `hit_count`, `loader_base`.
|
||||
- `GET /policy/findings` payload must surface `reachability.evidence[]` objects.
|
||||
|
||||
### 4.1 Signals runtime ingestion snapshot (Nov 2025)
|
||||
|
||||
- `/signals/runtime-facts` (JSON) and `/signals/runtime-facts/ndjson` (streaming, optional gzip) accept the following event fields:
|
||||
- `symbolId` (required), `codeId`, `loaderBase`, `hitCount`, `processId`, `processName`, `socketAddress`, `containerId`, `evidenceUri`, `metadata`.
|
||||
- Subject context (`scanId` / `imageDigest` / `component` / `version`) plus `callgraphId` is supplied either in the JSON body or as query params for the NDJSON endpoint.
|
||||
- Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
|
||||
- Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
|
||||
|
||||
### 4.2 Reachability store layout (SIG-STORE-401-016)
|
||||
|
||||
All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
|
||||
Each node in a richgraph-v1 document includes:
|
||||
|
||||
```json
|
||||
// func_nodes
|
||||
{
|
||||
"_id": "func:ELF:sha256:4012a0",
|
||||
"binDigest": "sha256:deadbeef...",
|
||||
"name": "ssl3_read_bytes",
|
||||
"addr": "0x4012a0",
|
||||
"size": 312,
|
||||
"lang": "c",
|
||||
"confidence": 0.92,
|
||||
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
|
||||
"sym": "present"
|
||||
}
|
||||
|
||||
// call_edges
|
||||
{
|
||||
"from": "func:ELF:sha256:4012a0",
|
||||
"to": "func:ELF:sha256:40f0ff",
|
||||
"kind": "static",
|
||||
"confidence": 0.88,
|
||||
"evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
|
||||
}
|
||||
|
||||
// cve_func_hits
|
||||
{
|
||||
"cve": "CVE-2023-XXXX",
|
||||
"purl": "pkg:generic/openssl@1.1.1u",
|
||||
"func_id": "func:ELF:sha256:4012a0",
|
||||
"match": "name+version",
|
||||
"confidence": 0.77,
|
||||
"source": "concelier:openssl-advisory"
|
||||
"id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
|
||||
"symbol_id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
|
||||
"code_id": "code:java:...",
|
||||
"lang": "java",
|
||||
"kind": "method",
|
||||
"display": "com.example.GreetingService.greet(String)",
|
||||
"purl": "pkg:maven/com.example/greeting-service@1.0.0",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_digest": "sha256:e5f6a7b8...",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"symbol": {
|
||||
"mangled": null,
|
||||
"demangled": "com.example.GreetingService.greet(String)",
|
||||
"source": "DWARF",
|
||||
"confidence": 0.98
|
||||
},
|
||||
"evidence": ["import", "bytecode"],
|
||||
"attributes": {}
|
||||
}
|
||||
```
|
||||
|
||||
Writers **must**:
|
||||
### 2.4 Graph Edge Schema
|
||||
|
||||
1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable.
|
||||
2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing.
|
||||
3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance.
|
||||
Edges carry callee `purl` and `symbol_digest` for SBOM correlation:
|
||||
|
||||
```json
|
||||
{
|
||||
"from": "sym:java:caller...",
|
||||
"to": "sym:java:callee...",
|
||||
"kind": "call",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"symbol_digest": "sha256:f1e2d3c4...",
|
||||
"confidence": 0.92,
|
||||
"evidence": ["bytecode", "import"],
|
||||
"candidates": []
|
||||
}
|
||||
```
|
||||
|
||||
### 2.5 Evidence Block Schema
|
||||
|
||||
Evidence blocks in Policy/VEX responses cite all relevant identifiers:
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "display": "processRequest()"},
|
||||
{"symbol_id": "sym:java:...", "display": "log4j.error()"}
|
||||
],
|
||||
"path_length": 3,
|
||||
"confidence": 0.85,
|
||||
"runtime_hits": ["probe:jfr:1234"],
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"toolchain_digest": "sha256:..."
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Test & Fixture Expectations
|
||||
## 3. API Usage
|
||||
|
||||
- **Reachbench fixtures**: update golden cases with `code_id` + `symbol` metadata. Ensure both reachable/unreachable variants still pass once graphs contain the richer IDs.
|
||||
- **Signals unit tests**: add deterministic tests for lattice scoring + runtime evidence linking (`tests/reachability/StellaOps.Signals.Reachability.Tests`).
|
||||
- **Replay tests**: extend `tests/reachability/StellaOps.Replay.Core.Tests` to assert manifest v2 serialization and hash enforcement.
|
||||
### 3.1 Signals Callgraph Ingestion
|
||||
|
||||
All fixtures must remain deterministic: sort nodes/edges, normalise casing, and freeze timestamps in test data.
|
||||
Submit a callgraph and receive a deterministic `graph_hash`:
|
||||
|
||||
```http
|
||||
POST /signals/callgraphs
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"schema": "richgraph-v1",
|
||||
"analyzer": {"name": "scanner.java", "version": "1.2.0"},
|
||||
"nodes": [...],
|
||||
"edges": [...],
|
||||
"roots": [...]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"graphHash": "blake3:a1b2c3d4e5f6...",
|
||||
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"nodeCount": 1247,
|
||||
"edgeCount": 3891
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Signals Runtime Facts
|
||||
|
||||
Submit runtime observations with `code_id` anchors:
|
||||
|
||||
```http
|
||||
POST /signals/runtime-facts/ndjson?scanId=scan-123&imageDigest=sha256:abc123
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/x-ndjson
|
||||
Content-Encoding: gzip
|
||||
|
||||
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":47,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:00Z"}
|
||||
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":12,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:01Z"}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"accepted": 128,
|
||||
"duplicates": 2,
|
||||
"evidenceUri": "cas://reachability/runtime/sha256:xyz789..."
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Fetch Reachability Facts
|
||||
|
||||
Query reachability state for a subject:
|
||||
|
||||
```http
|
||||
GET /signals/facts/{subjectKey}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
|
||||
"metadata": {
|
||||
"fact": {
|
||||
"digest": "sha256:abc123...",
|
||||
"version": 3
|
||||
}
|
||||
},
|
||||
"states": [
|
||||
{
|
||||
"symbol": "sym:java:...",
|
||||
"latticeState": "CR",
|
||||
"bucket": "runtime",
|
||||
"confidence": 0.92,
|
||||
"score": 0.78,
|
||||
"path": ["sym:java:main...", "sym:java:process...", "sym:java:log4j..."],
|
||||
"evidence": {
|
||||
"static": {"graphHash": "blake3:...", "pathLength": 3, "confidence": 0.85},
|
||||
"runtime": {"probeId": "probe:jfr:1234", "hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
|
||||
}
|
||||
}
|
||||
],
|
||||
"score": 0.78,
|
||||
"aggregateTier": "T2",
|
||||
"riskScore": 0.65
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Policy Findings with Reachability Evidence
|
||||
|
||||
```http
|
||||
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response (excerpt):**
|
||||
|
||||
```json
|
||||
{
|
||||
"findingId": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "display": "Logger.error()"}
|
||||
],
|
||||
"runtime_hits": 47,
|
||||
"fact_digest": "sha256:abc123..."
|
||||
}
|
||||
},
|
||||
"steps": [
|
||||
{"rule": "reachability_gate", "state": "CR", "allowed": true},
|
||||
{"rule": "severity_baseline", "severity": {"normalized": "Critical", "score": 10.0}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Handoff Checklist for the Next Agent
|
||||
## 4. CLI Usage
|
||||
|
||||
1. Confirm sprint entries (`SPRINT_400` and `SPRINT_401`) remain in sync when moving `GAP-*` tasks to DOING/DONE.
|
||||
2. Start with `GAP-SYM-007` (schema/helper implementation) because downstream work depends on the new `code_id` payload shape.
|
||||
3. Once schema PR merges, coordinate with Signals + Policy guilds to align on CAS naming and DSSE predicates before wiring APIs.
|
||||
4. Update the docs listed in §4 as each component lands; keep this file current with statuses and links to PRs/ADRs.
|
||||
5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.
|
||||
### 4.1 Graph Explain Command
|
||||
|
||||
Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.
|
||||
View the call path and evidence for a finding:
|
||||
|
||||
```bash
|
||||
stella graph explain --finding "pkg:maven/log4j@2.14.1:CVE-2021-44228" --scan-id scan-123
|
||||
|
||||
# Output:
|
||||
Finding: CVE-2021-44228 in pkg:maven/log4j@2.14.1
|
||||
Reachability: CONFIRMED_REACHABLE (CR)
|
||||
Confidence: 0.92
|
||||
Graph Hash: blake3:a1b2c3d4e5f6...
|
||||
|
||||
Call Path (3 hops):
|
||||
1. main() [sym:java:R3JlZXRpbmcuLi4=]
|
||||
-> processRequest() [direct call]
|
||||
2. processRequest() [sym:java:cHJvY2Vzcy4uLg==]
|
||||
-> Logger.error() [virtual call]
|
||||
3. Logger.error() [sym:java:bG9nNGouLi4=]
|
||||
[VULNERABLE: CVE-2021-44228]
|
||||
|
||||
Runtime Evidence:
|
||||
- JFR probe hit: 47 times
|
||||
- Last observed: 2025-12-13T10:00:00Z
|
||||
|
||||
DSSE Attestation: cas://reachability/graphs/a1b2c3d4....dsse
|
||||
```
|
||||
|
||||
### 4.2 Graph Export Command
|
||||
|
||||
Export a reachability graph for offline analysis:
|
||||
|
||||
```bash
|
||||
stella graph export --scan-id scan-123 --output ./evidence-bundle/
|
||||
|
||||
# Creates:
|
||||
# ./evidence-bundle/richgraph-v1.json # Canonical graph
|
||||
# ./evidence-bundle/richgraph-v1.json.dsse # DSSE envelope
|
||||
# ./evidence-bundle/meta.json # Metadata
|
||||
# ./evidence-bundle/runtime-facts.ndjson # Runtime observations
|
||||
```
|
||||
|
||||
### 4.3 Graph Verify Command
|
||||
|
||||
Verify a graph's DSSE signature and Rekor inclusion:
|
||||
|
||||
```bash
|
||||
stella graph verify --graph ./evidence-bundle/richgraph-v1.json \
|
||||
--dsse ./evidence-bundle/richgraph-v1.json.dsse \
|
||||
--rekor-log
|
||||
|
||||
# Output:
|
||||
Graph Hash: blake3:a1b2c3d4e5f6...
|
||||
DSSE Signature: VALID (key: scanner-signing-2025)
|
||||
Rekor Entry: 12345678 (verified)
|
||||
Timestamp: 2025-12-13T09:30:00Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. OpenVEX Integration
|
||||
|
||||
### 5.1 OpenVEX with Reachability Evidence
|
||||
|
||||
When Policy emits VEX decisions, reachability evidence is included:
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://openvex.dev/ns/v0.2.0",
|
||||
"@id": "https://stellaops.example/vex/2025-12-13/001",
|
||||
"author": "StellaOps Policy Engine",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"version": 1,
|
||||
"statements": [
|
||||
{
|
||||
"vulnerability": {"@id": "CVE-2021-44228"},
|
||||
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
|
||||
"status": "affected",
|
||||
"justification": "vulnerable_code_in_container",
|
||||
"impact_statement": "Vulnerable Log4j method reachable from main entry point.",
|
||||
"action_statement": "Upgrade to log4j 2.17.1 or later.",
|
||||
"stellaops:reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"path_length": 3,
|
||||
"evidence_uri": "cas://reachability/graphs/a1b2c3d4..."
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 VEX "not_affected" with Unreachability Evidence
|
||||
|
||||
When code is provably unreachable:
|
||||
|
||||
```json
|
||||
{
|
||||
"statements": [
|
||||
{
|
||||
"vulnerability": {"@id": "CVE-2023-XXXXX"},
|
||||
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
|
||||
"status": "not_affected",
|
||||
"justification": "vulnerable_code_not_in_execute_path",
|
||||
"impact_statement": "Vulnerable function not reachable from any entry point.",
|
||||
"stellaops:reachability": {
|
||||
"state": "CU",
|
||||
"confidence": 0.88,
|
||||
"graph_hash": "blake3:d4e5f6a7b8c9...",
|
||||
"evidence_uri": "cas://reachability/graphs/d4e5f6a7b8c9...",
|
||||
"runtime_observation_window": "72h",
|
||||
"runtime_hits": 0
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Replay Manifest v2
|
||||
|
||||
### 6.1 Manifest Structure
|
||||
|
||||
Replay manifests now enforce BLAKE3 hashing and CAS registration:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.replay.manifest@v2",
|
||||
"subject": "scan:123",
|
||||
"generatedAt": "2025-12-13T10:00:00Z",
|
||||
"hashAlg": "blake3",
|
||||
"artifacts": [
|
||||
{
|
||||
"kind": "richgraph",
|
||||
"uri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6...",
|
||||
"hash": "blake3:a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6....dsse"
|
||||
},
|
||||
{
|
||||
"kind": "runtime-facts",
|
||||
"uri": "cas://reachability/runtime/sha256:xyz789...",
|
||||
"hash": "sha256:xyz789..."
|
||||
},
|
||||
{
|
||||
"kind": "sbom",
|
||||
"uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"hash": "sha256:def456..."
|
||||
}
|
||||
],
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"toolchain_digest": "sha256:..."
|
||||
},
|
||||
"code_id_coverage": {
|
||||
"total_symbols": 1247,
|
||||
"with_code_id": 1189,
|
||||
"coverage_pct": 95.3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Determinism Verification
|
||||
|
||||
Replay a manifest to verify determinism:
|
||||
|
||||
```bash
|
||||
stella replay verify --manifest ./manifest.json --sealed
|
||||
|
||||
# Output:
|
||||
Manifest: stellaops.replay.manifest@v2
|
||||
Subject: scan:123
|
||||
Artifacts: 3
|
||||
|
||||
Verifying richgraph...
|
||||
Computed: blake3:a1b2c3d4e5f6...
|
||||
Expected: blake3:a1b2c3d4e5f6...
|
||||
Status: MATCH
|
||||
|
||||
Verifying runtime-facts...
|
||||
Computed: sha256:xyz789...
|
||||
Expected: sha256:xyz789...
|
||||
Status: MATCH
|
||||
|
||||
Verifying sbom...
|
||||
Computed: sha256:def456...
|
||||
Expected: sha256:def456...
|
||||
Status: MATCH
|
||||
|
||||
All artifacts verified. Determinism check PASSED.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Module Integration Guide
|
||||
|
||||
### 7.1 Scanner -> Signals
|
||||
|
||||
Scanner emits richgraph-v1 with `code_id` and `symbol_id`:
|
||||
|
||||
1. Scanner analyzes container/artifact
|
||||
2. Callgraph generators emit nodes with `symbol_id`, `code_id`, `build_id`
|
||||
3. RichGraphWriter canonicalizes (sorted arrays/keys) and computes `graph_hash` (BLAKE3)
|
||||
4. DSSE signer wraps canonical JSON
|
||||
5. CAS store persists body + envelope
|
||||
6. Signals ingestion API receives URI reference
|
||||
|
||||
### 7.2 Signals -> Policy
|
||||
|
||||
Signals provides reachability facts to Policy:
|
||||
|
||||
1. Policy queries `/signals/facts/{subjectKey}`
|
||||
2. Response includes `metadata.fact.digest`, `states[]`, `score`
|
||||
3. Policy gates check `latticeState` (U, SR, SU, RO, RU, CR, CU, X)
|
||||
4. Evidence blocks in findings reference `graph_hash`, `path[]`, `runtime_hits[]`
|
||||
|
||||
### 7.3 Policy -> VEX/UI
|
||||
|
||||
Policy emits OpenVEX with evidence:
|
||||
|
||||
1. VexDecisionEmitter serializes OpenVEX with `stellaops:reachability` extension
|
||||
2. UI explain drawer fetches evidence via `/api/policy/findings/{id}/explain`
|
||||
3. CLI `stella graph explain` renders call path and attestation refs
|
||||
|
||||
---
|
||||
|
||||
## 8. CAS Layout Reference
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
graphs/
|
||||
{blake3}/ # Graph body (canonical JSON)
|
||||
{blake3}.dsse # DSSE envelope
|
||||
edges/
|
||||
{graph_hash}/{bundle_id} # Edge bundle body (optional)
|
||||
{graph_hash}/{bundle_id}.dsse
|
||||
runtime/
|
||||
{sha256}/ # Runtime facts NDJSON
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Related Documentation
|
||||
|
||||
- [Reachability Lattice Model](./lattice.md) - State definitions and join rules
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Schema specification
|
||||
- [Evidence Schema](./evidence-schema.md) - Detailed field definitions
|
||||
- [Signals API Contract](../api/signals/reachability-contract.md) - API reference
|
||||
- [Policy Gates](./policy-gate.md) - Gate configuration
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Ground Truth Schema](./ground-truth-schema.md) - Test fixture format
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GAP-DOC-008 for change history._
|
||||
|
||||
377
docs/reachability/graph-revision-schema.md
Normal file
377
docs/reachability/graph-revision-schema.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Graph Revision Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Platform Guild._
|
||||
|
||||
This document defines the graph revision schema addressing gaps GR1-GR10 from the November 2025 product findings. It specifies manifest structure, hash algorithms, storage layout, lineage tracking, and governance rules for deterministic, auditable reachability graphs.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Graph revisions provide content-addressable, append-only versioning for `richgraph-v1` documents. Every graph mutation produces a new immutable revision with:
|
||||
|
||||
- **Deterministic hash:** BLAKE3-256 of canonical JSON
|
||||
- **Lineage metadata:** Parent revision + diff summary
|
||||
- **Cross-artifact digests:** Links to SBOM, VEX, policy, and tool versions
|
||||
- **Audit trail:** Timestamp, actor, tenant, and operation type
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### GR1: Manifest Schema + Canonical Hash Rules
|
||||
|
||||
**Manifest schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:a1b2c3d4e5f6...",
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"parent_revision_id": "rev:blake3:9f8e7d6c5b4a...",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"created_by": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"operation": "create",
|
||||
"lineage": {
|
||||
"depth": 3,
|
||||
"root_revision_id": "rev:blake3:1a2b3c4d5e6f..."
|
||||
},
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"vex_digest": "sha256:...",
|
||||
"policy_digest": "sha256:...",
|
||||
"analyzer_digest": "sha256:..."
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"roots_changed": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Canonical hash rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all nesting levels
|
||||
2. No whitespace/indentation (compact JSON)
|
||||
3. UTF-8 encoding, no BOM
|
||||
4. Arrays sorted by deterministic key (nodes by `id`, edges by `from,to,kind`)
|
||||
5. Null/empty values omitted
|
||||
6. Numeric values without trailing zeros
|
||||
|
||||
### GR2: Mandated BLAKE3-256 Encoding
|
||||
|
||||
All graph-level hashes use BLAKE3-256 with the following format:
|
||||
|
||||
```
|
||||
blake3:{64_hex_chars}
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- BLAKE3 is 3x+ faster than SHA-256 on modern CPUs
|
||||
- Parallelizable for large graphs (>100K nodes)
|
||||
- Cryptographically secure (256-bit security)
|
||||
- Algorithm prefix enables future migration
|
||||
|
||||
### GR3: Append-Only Storage
|
||||
|
||||
Graph revisions are immutable. Operations:
|
||||
|
||||
| Operation | Creates New Revision | Modifies Existing |
|
||||
|-----------|---------------------|-------------------|
|
||||
| `create` | Yes | No |
|
||||
| `update` | Yes | No |
|
||||
| `merge` | Yes | No |
|
||||
| `tombstone` | Yes | No |
|
||||
| `read` | No | No |
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
revisions/
|
||||
{blake3}/ # Revision manifest
|
||||
{blake3}.graph # Graph body
|
||||
{blake3}.dsse # DSSE envelope
|
||||
indices/
|
||||
by-tenant/{tenant_id}/ # Tenant index
|
||||
by-sbom/{sbom_digest}/ # SBOM correlation
|
||||
by-root/{root_revision_id}/ # Lineage tree
|
||||
```
|
||||
|
||||
### GR4: Lineage/Diff Metadata
|
||||
|
||||
Every revision tracks its lineage:
|
||||
|
||||
```json
|
||||
{
|
||||
"lineage": {
|
||||
"depth": 5,
|
||||
"root_revision_id": "rev:blake3:...",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"merge_parents": []
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"nodes_modified": 0,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"edges_modified": 0,
|
||||
"roots_added": 0,
|
||||
"roots_removed": 0
|
||||
},
|
||||
"diff_detail_uri": "cas://reachability/diffs/{parent_hash}_{child_hash}.ndjson"
|
||||
}
|
||||
```
|
||||
|
||||
**Diff detail format (NDJSON):**
|
||||
|
||||
```ndjson
|
||||
{"op":"add","path":"nodes","value":{"id":"sym:java:...","display":"..."}}
|
||||
{"op":"remove","path":"edges","from":"sym:java:a","to":"sym:java:b"}
|
||||
```
|
||||
|
||||
### GR5: Cross-Artifact Digests (SBOM/VEX/Policy/Tool)
|
||||
|
||||
Every revision links to related artifacts:
|
||||
|
||||
```json
|
||||
{
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"sbom_format": "cyclonedx-1.6",
|
||||
"vex_digest": "sha256:...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json",
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"analyzer_digest": "sha256:...",
|
||||
"analyzer_name": "scanner.java",
|
||||
"analyzer_version": "1.2.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR6: UI/CLI Surfacing of Full/Short IDs
|
||||
|
||||
**Full ID format:**
|
||||
```
|
||||
rev:blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Short ID format (for display):**
|
||||
```
|
||||
rev:a1b2c3d4
|
||||
```
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List revisions
|
||||
stella graph revisions --scan-id scan-123
|
||||
|
||||
# Show full ID
|
||||
stella graph revisions --scan-id scan-123 --full
|
||||
|
||||
# Output:
|
||||
REVISION CREATED NODES EDGES PARENT
|
||||
rev:a1b2c3d4 2025-12-13T10:00:00 1247 3891 rev:9f8e7d6c
|
||||
rev:9f8e7d6c 2025-12-12T15:30:00 1235 3867 rev:1a2b3c4d
|
||||
```
|
||||
|
||||
**UI display:**
|
||||
|
||||
- Revision chips show short ID with copy-to-clipboard for full ID
|
||||
- Hover tooltip shows full ID and creation timestamp
|
||||
- Lineage tree visualization available in "Revision History" drawer
|
||||
|
||||
### GR7: Shard/Tenant Context
|
||||
|
||||
Every revision includes partition context:
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"namespace": "prod",
|
||||
"workspace_id": "ws:default"
|
||||
}
|
||||
```
|
||||
|
||||
**Tenant isolation:**
|
||||
|
||||
- Revisions are tenant-scoped; cross-tenant access requires explicit grants
|
||||
- Shard ID enables horizontal scaling and data locality
|
||||
- Namespace supports multi-environment deployments
|
||||
|
||||
### GR8: Pin/Audit Governance
|
||||
|
||||
**Pinned revisions:**
|
||||
|
||||
Revisions can be pinned to prevent automatic retention cleanup:
|
||||
|
||||
```json
|
||||
{
|
||||
"pinned": true,
|
||||
"pinned_at": "2025-12-13T10:00:00Z",
|
||||
"pinned_by": "user:alice",
|
||||
"pin_reason": "Audit retention for CVE-2021-44228 investigation",
|
||||
"pin_expires_at": "2026-12-13T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Audit events:**
|
||||
|
||||
All revision operations emit audit events:
|
||||
|
||||
```json
|
||||
{
|
||||
"event_type": "graph.revision.created",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"actor": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"metadata": {
|
||||
"operation": "create",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR9: Retention/Tombstones
|
||||
|
||||
**Retention policy:**
|
||||
|
||||
| Category | Default Retention | Configurable |
|
||||
|----------|-------------------|--------------|
|
||||
| Latest revision | Forever | No |
|
||||
| Intermediate revisions | 90 days | Yes |
|
||||
| Tombstoned revisions | 30 days | Yes |
|
||||
| Pinned revisions | Until unpin + 7 days | No |
|
||||
|
||||
**Tombstone format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"tombstone": true,
|
||||
"tombstoned_at": "2025-12-13T10:00:00Z",
|
||||
"tombstoned_by": "service:retention-worker",
|
||||
"tombstone_reason": "retention_policy",
|
||||
"successor_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
### GR10: Inclusion in Offline Kits
|
||||
|
||||
Offline kits include graph revisions for air-gapped deployments:
|
||||
|
||||
**Offline bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.offline.bundle@v1",
|
||||
"bundle_id": "bundle:2025-12-13",
|
||||
"graph_revisions": [
|
||||
{
|
||||
"revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:...",
|
||||
"included_artifacts": ["graph", "dsse", "diff"]
|
||||
}
|
||||
],
|
||||
"rekor_checkpoints": [
|
||||
{
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"checkpoint": "...",
|
||||
"verified_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"signature": {
|
||||
"algorithm": "ecdsa-p256",
|
||||
"value": "base64:...",
|
||||
"public_key_id": "key:offline-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Import verification:**
|
||||
|
||||
```bash
|
||||
stella offline import --bundle ./offline-bundle.tgz --verify
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:2025-12-13
|
||||
Graph Revisions: 5
|
||||
Rekor Checkpoints: 2
|
||||
|
||||
Verifying signatures...
|
||||
Bundle signature: VALID
|
||||
DSSE envelopes: 5/5 VALID
|
||||
Rekor checkpoints: 2/2 VERIFIED
|
||||
|
||||
Import complete.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Create Revision
|
||||
|
||||
```http
|
||||
POST /api/graph/revisions
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"graph": { ... richgraph-v1 ... },
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"cross_artifacts": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Revision
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/{revision_id}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.3 List Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions?tenant_id=acme&sbom_digest=sha256:...&limit=20
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.4 Diff Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/diff?from={rev_a}&to={rev_b}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [CAS Infrastructure](../contracts/cas-infrastructure.md) - Content-addressable storage
|
||||
- [Offline Kit](../24_OFFLINE_KIT.md) - Air-gap deployment
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GRAPHREV-GAPS-401-063 for change history._
|
||||
@@ -84,7 +84,93 @@ Stella Ops provides **true hybrid reachability** by combining:
|
||||
|
||||
**Evidence linking:** Each edge in the graph or bundle includes `evidenceRefs` pointing to the underlying proof artifacts (static analysis artifacts, runtime traces), enabling **evidence-linked VEX decisions**.
|
||||
|
||||
## 8. Open decisions (tracked in Sprint 0401 tasks 53–56)
|
||||
- Rekor publish defaults per deployment tier (regulated vs standard).
|
||||
- CLI UX for selective bundle verification.
|
||||
- Bench coverage for edge-bundle verification time/size.
|
||||
## 8. Decisions (Frozen 2025-12-13)
|
||||
|
||||
### 8.1 DSSE/Rekor Budget by Deployment Tier
|
||||
|
||||
| Tier | Graph DSSE | Edge-Bundle DSSE | Rekor Publish | Max Bundles/Graph |
|
||||
|------|------------|------------------|---------------|-------------------|
|
||||
| **Regulated** (SOC2, FedRAMP, PCI) | Required | Required for runtime/contested | Required | 10 |
|
||||
| **Standard** | Required | Optional (criteria-based) | Graph only | 5 |
|
||||
| **Air-gapped** | Required | Optional | Offline checkpoint | 5 |
|
||||
| **Dev/Test** | Optional | Optional | Disabled | Unlimited |
|
||||
|
||||
**Budget enforcement:**
|
||||
- Graph DSSE: Always submit digest to Rekor (or offline checkpoint for air-gapped)
|
||||
- Edge-bundle DSSE: Submit to Rekor only when `bundle_reason` is `disputed`, `runtime-hit`, or `security-critical`
|
||||
- Cap enforced by `reachability.edgeBundles.maxRekorPublishes` config (per tier defaults above)
|
||||
|
||||
### 8.2 Signing Layout and CAS Paths
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
graphs/
|
||||
{blake3}/ # richgraph-v1 body (JSON)
|
||||
{blake3}.dsse # Graph DSSE envelope
|
||||
{blake3}.rekor # Rekor inclusion proof (optional)
|
||||
edges/
|
||||
{graph_hash}/
|
||||
{bundle_id}.json # Edge bundle body
|
||||
{bundle_id}.dsse # Edge bundle DSSE envelope
|
||||
{bundle_id}.rekor # Rekor inclusion proof (if published)
|
||||
revisions/
|
||||
{revision_id}/ # Revision manifest + lineage
|
||||
```
|
||||
|
||||
**Signing workflow:**
|
||||
1. Canonicalize richgraph-v1 JSON (sorted keys, arrays by deterministic key)
|
||||
2. Compute BLAKE3-256 hash -> `graph_hash`
|
||||
3. Create DSSE envelope with `stella.ops/graph@v1` predicate
|
||||
4. Submit digest to Rekor (online) or cache checkpoint (offline)
|
||||
5. Store graph body + envelope + proof in CAS
|
||||
|
||||
### 8.3 CLI UX for Selective Bundle Verification
|
||||
|
||||
```bash
|
||||
# Verify graph DSSE only (default)
|
||||
stella graph verify --hash blake3:a1b2c3d4...
|
||||
|
||||
# Verify graph + all edge bundles
|
||||
stella graph verify --hash blake3:a1b2c3d4... --include-bundles
|
||||
|
||||
# Verify specific edge bundle
|
||||
stella graph verify --hash blake3:a1b2c3d4... --bundle bundle:001
|
||||
|
||||
# Offline verification with local CAS
|
||||
stella graph verify --hash blake3:a1b2c3d4... --cas-root ./offline-cas/
|
||||
|
||||
# Verify Rekor inclusion
|
||||
stella graph verify --hash blake3:a1b2c3d4... --rekor-proof
|
||||
|
||||
# Output formats
|
||||
stella graph verify --hash blake3:a1b2c3d4... --format json|table|summary
|
||||
```
|
||||
|
||||
### 8.4 Golden Fixture Plan
|
||||
|
||||
**Fixture location:** `tests/Reachability/Hybrid/`
|
||||
|
||||
**Required fixtures:**
|
||||
| Fixture | Description | Expected Verification Time |
|
||||
|---------|-------------|---------------------------|
|
||||
| `graph-only.golden.json` | Minimal richgraph-v1 with DSSE | < 100ms |
|
||||
| `graph-with-runtime.golden.json` | Graph + 1 runtime edge bundle | < 200ms |
|
||||
| `graph-with-contested.golden.json` | Graph + 1 contested/revoked edge bundle | < 200ms |
|
||||
| `large-graph.golden.json` | 10K nodes, 50K edges, 5 bundles | < 2s |
|
||||
| `offline-bundle.golden.tgz` | Complete offline replay pack | < 5s |
|
||||
|
||||
**CI integration:**
|
||||
- `.gitea/workflows/hybrid-attestation.yml` runs verification fixtures
|
||||
- Size gate: Graph body < 10MB, individual bundle < 1MB
|
||||
- Time gate: Full verification < 5s for standard tier
|
||||
|
||||
### 8.5 Implementation Status
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Graph DSSE predicate | Done | `stella.ops/graph@v1` in PredicateTypes.cs |
|
||||
| Edge-bundle DSSE predicate | Planned | `stella.ops/edgeBundle@v1` |
|
||||
| CAS layout | Done | Per section 8.2 |
|
||||
| CLI verify command | Planned | Per section 8.3 |
|
||||
| Golden fixtures | Planned | Per section 8.4 |
|
||||
| Rekor integration | Done | Via Attestor module |
|
||||
|
||||
Reference in New Issue
Block a user