up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-12-13 09:37:15 +02:00
parent e00f6365da
commit 6e45066e37
349 changed files with 17160 additions and 1867 deletions

View File

@@ -0,0 +1,461 @@
# Binary Reachability Schema
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
---
## 1. Overview
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
- **Build variants:** Handling multiple builds from same source
- **Large graphs:** Chunking and size limits for DSSE/Rekor
- **Offline verification:** Air-gapped attestation workflows
---
## 2. Gap Resolutions
### BR1: Canonical DSSE/Predicate Schemas
**Binary graph predicate:**
```
stella.ops/binaryGraph@v1
```
**Predicate schema:**
```json
{
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
"subject": [
{
"name": "graph",
"digest": {"blake3": "a1b2c3d4e5f6..."}
}
],
"predicate": {
"analyzer": {
"name": "scanner.native",
"version": "1.2.0",
"toolchain": "ghidra-11.2"
},
"binary": {
"format": "ELF",
"arch": "x86_64",
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c..."
},
"graph_stats": {
"node_count": 1247,
"edge_count": 3891,
"root_count": 5
},
"evidence": {
"symbols_source": "DWARF",
"stripped_symbols": 58,
"heuristic_symbols": 12
},
"created_at": "2025-12-13T10:00:00Z"
}
}
```
**Edge bundle predicate:**
```
stella.ops/binaryEdgeBundle@v1
```
```json
{
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
"subject": [
{
"name": "edges",
"digest": {"sha256": "..."}
}
],
"predicate": {
"graph_hash": "blake3:a1b2c3d4...",
"bundle_id": "bundle:001",
"bundle_reason": "init_array",
"edge_count": 128,
"edges": [
{
"from": "sym:binary:...",
"to": "sym:binary:...",
"reason": "init-array",
"confidence": 0.95
}
]
}
}
```
### BR2: Edge Hash Recipe
**Binary edge hash computation:**
```
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason,
"binary_hash": binary.file_hash // Binary context included
})
)
```
**Hash includes binary context:**
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
**Canonicalization:**
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
2. No whitespace, UTF-8 encoding
3. Lowercase hex for all hashes
### BR3: Required Binary Evidence with CAS Refs
**Required evidence per node:**
| Evidence Type | Required | CAS Storage |
|---------------|----------|-------------|
| File hash | Yes | N/A (inline) |
| Build ID | Conditional | N/A (inline) |
| Symbol source | Yes | N/A (inline) |
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
| CFG | Optional | `cas://binary/cfg/{sha256}` |
**Evidence schema:**
```json
{
"binary_evidence": {
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_source": "DWARF",
"symbol_confidence": 0.95,
"code_block_hash": "sha256:deadbeef...",
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
"disassembly_uri": "cas://binary/disasm/sha256:...",
"cfg_uri": "cas://binary/cfg/sha256:..."
}
}
```
**CAS layout:**
```
cas://binary/
blocks/{sha256}/ # Code block bytes
disasm/{sha256}/ # Disassembly JSON
cfg/{sha256}/ # Control flow graph
symbols/{sha256}/ # Symbol table extract
```
### BR4: Build-ID/Variant Rules
**Build-ID sources:**
| Format | Build-ID Source | Example |
|--------|-----------------|---------|
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
**Fallback when build-ID absent:**
```json
{
"build_id": null,
"build_id_fallback": {
"method": "file_hash",
"value": "sha256:...",
"confidence": 0.7
}
}
```
**Variant handling:**
Multiple binaries from same source (debug/release, different arch):
```json
{
"variant_group": "sha256:source_hash...",
"variants": [
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
]
}
```
### BR5: Policy Hash Governance
**Policy version binding:**
Binary reachability graphs are bound to a policy version:
```json
{
"policy_binding": {
"policy_digest": "sha256:...",
"policy_version": "P-7:v4",
"bound_at": "2025-12-13T10:00:00Z",
"binding_mode": "strict"
}
}
```
**Binding modes:**
| Mode | Behavior |
|------|----------|
| `strict` | Graph invalid if policy changes |
| `forward` | Graph valid with newer policy versions |
| `any` | Graph valid with any policy version |
**Governance rules:**
1. Production graphs use `strict` binding
2. Test graphs may use `forward`
3. Policy hash computed from canonical DSL
4. Binding stored in graph metadata
### BR6: Sigstore Bundle/Log Routing
**Sigstore integration:**
```json
{
"sigstore": {
"bundle_type": "hashedrekord",
"log_index": 12345678,
"log_id": "rekor.sigstore.dev",
"inclusion_proof": {
"log_index": 12345678,
"root_hash": "sha256:...",
"tree_size": 98765432,
"hashes": ["sha256:...", "sha256:..."]
},
"signed_entry_timestamp": "base64:..."
}
}
```
**Log routing:**
| Evidence Type | Log | Notes |
|---------------|-----|-------|
| Graph DSSE | Rekor (public) | Always |
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
| Code block | No log | CAS only |
| CFG/Disasm | No log | CAS only |
**Offline mode:**
When Rekor unavailable:
```json
{
"sigstore": {
"mode": "offline",
"checkpoint": {
"origin": "rekor.sigstore.dev",
"checkpoint_data": "base64:...",
"captured_at": "2025-12-13T10:00:00Z"
},
"deferred_submission": true
}
}
```
### BR7: Idempotent Submission Keys
**Submission key format:**
```
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
```
**Idempotency rules:**
1. Same key returns existing entry (no duplicate)
2. Key includes hour-granularity timestamp for rate limiting
3. Different graphs from same binary produce different keys
4. Retry within 1 hour uses same key
**Implementation:**
```json
{
"submission": {
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
"status": "accepted",
"existing_entry": false,
"log_index": 12345678
}
}
```
### BR8: Size/Chunking Limits
**Size limits:**
| Element | Limit | Action on Exceed |
|---------|-------|------------------|
| Graph JSON | 10 MB | Chunk nodes/edges |
| Edge bundle | 512 edges | Split bundles |
| DSSE payload | 1 MB | Compress/chunk |
| Rekor entry | 100 KB | Reference CAS |
**Chunking strategy:**
For large graphs (>10MB):
```json
{
"chunked_graph": {
"chunk_count": 5,
"chunks": [
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
],
"assembly_order": ["chunk:001", "chunk:002", ...],
"assembled_hash": "blake3:..."
}
}
```
**Compression:**
- Graph JSON: gzip before DSSE
- CAS storage: Raw JSON (indexed)
- Rekor payload: DSSE references CAS
### BR9: API/CLI/UI Surfacing
**API endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/binary/graphs` | Submit binary graph |
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
| `POST` | `/api/binary/verify` | Verify graph attestation |
**CLI commands:**
```bash
# Submit binary graph
stella binary submit --graph ./richgraph.json --binary ./app
# Get graph info
stella binary info --hash blake3:a1b2c3d4...
# List symbols
stella binary symbols --hash blake3:... --stripped-only
# Verify attestation
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
```
**UI components:**
- Binary graph visualization with zoom/pan
- Symbol table with search/filter
- Edge explorer with confidence highlighting
- Attestation status badges
- Build variant selector
### BR10: Binary Fixtures
**Fixture location:**
```
tests/Binary/
fixtures/
elf-x86_64-with-debug/
binary.elf
graph.json
expected-hashes.txt
elf-stripped/
binary.elf
graph.json
expected-hashes.txt
pe-x64-with-pdb/
binary.exe
graph.json
expected-hashes.txt
golden/
elf-x86_64.golden.json
pe-x64.golden.json
datasets/binary/
schema/
binary-graph.schema.json
binary-edge.schema.json
samples/
openssl-1.1.1/
libssl.so
graph.json
edges.ndjson
```
**Fixture requirements:**
1. Each binary format has at least one fixture
2. Stripped and debug variants for each format
3. Expected hashes verified by CI
4. Golden outputs include DSSE envelopes
5. Fixtures reproducible from source (where legal)
**Test categories:**
1. **Hash stability:** Same binary produces same graph hash
2. **Build-ID extraction:** Correct build-ID parsing per format
3. **Symbol recovery:** DWARF/PDB parsing accuracy
4. **Stripped handling:** Code block hash computation
5. **Chunking:** Large graph assembly/disassembly
6. **DSSE signing:** Envelope creation and verification
7. **Rekor integration:** Submission and verification
---
## 3. Implementation Status
| Component | Location | Status |
|-----------|----------|--------|
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
| UI components | `src/UI/StellaOps.UI` | Planned |
---
## 4. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
---
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._

View File

@@ -1,45 +1,69 @@
# Reachability Corpus Plan (QA-CORPUS-401-031)
Objective
- Build a multi-runtime reachability corpus (Go/.NET/Python/Rust) with EXPECT.yaml ground truths and captured traces.
- Make fixtures CI-consumable to validate reachability scoring and VEX proofs continuously.
- Add public mini-dataset cases (PHP/JavaScript/C#) from advisory 23-Nov-2025 for ingestion/bench reuse.
- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.
Scope & deliverables
- Fixture layout: `tests/reachability/corpus/<language>/<case>/`
- `expect.yaml` — states (`reachable|conditional|unreachable`), score, evidence refs.
- `callgraph.*.json` — static graphs per language.
- `runtime/*.ndjson` — traces/probes when available.
- `sbom.*.json` — CycloneDX/SPDX slices.
- `vex.openvex.json` — expected VEX statement.
- CI integration: add corpus harness to `tests/reachability/StellaOps.Reachability.FixtureTests` to validate presence, schema, and determinism (hash manifest).
- Offline posture: all artifacts deterministic, no external downloads; hashes recorded in manifest.
- Public mini-dataset layout (PHP/JS/C#) to be mirrored under `tests/reachability/samples-public/`:
```
vuln-reach-dataset/
schema/ground-truth.schema.json
runners/run_all.sh
samples/
php/php-001-phar-deserialize/...
js/js-002-yaml-unsafe-load/...
csharp/cs-001-binaryformatter-deserialize/...
```
Each sample ships: minimal app, lockfile, SBOM (CycloneDX JSON), VEX, ground truth (EXPECT/JSON), repro script.
## Corpus Map
MVP slice (proposed)
### 1) Multi-runtime corpus (internal MVP)
Path: `tests/reachability/corpus/`
Per-case layout: `tests/reachability/corpus/<language>/<case>/`
- `callgraph.static.json` — static call graph sample (stub for MVP).
- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
- `vex.openvex.json` — expected VEX slice for the case.
- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
### 2) Public mini dataset (PHP/JS/C#)
Path: `tests/reachability/samples-public/`
Layout:
- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
### 3) Reachbench fixture pack (expanded, dual variants)
Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
## Ground Truth Conventions
- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
## Determinism & Runners
Regenerate all reachability manifests (corpus + public samples + reachbench pack):
- `tests/reachability/runners/run_all.sh`
- `tests/reachability/runners/run_all.ps1`
Individual scripts:
- `python tests/reachability/scripts/update_corpus_manifest.py`
- `python tests/reachability/samples-public/scripts/update_manifest.py`
- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
## CI Gates
- `tests/reachability/StellaOps.Reachability.FixtureTests`
- validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
- enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
## MVP Slice (stub cases)
- Go: `go-ssh-CVE-2020-9283-keyexchange`
- .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
- Python: `python-django-CVE-2019-19844-sqli-like`
- Rust: `rust-axum-header-parsing-TBD`
Work plan
1) Define shared manifest schema + hash manifest (NDJSON) under `tests/reachability/corpus/manifest.json`.
2) For each MVP case, add minimal static callgraph + EXPECT.yaml with score/state and evidence links. (DONE: stub versions committed)
3) Extend reachability fixture tests to cover corpus folders (presence, hashes, EXPECT.yaml schema). (DONE)
4) Wire CI job to run the extended tests in `tests/reachability/StellaOps.Reachability.FixtureTests`. (TODO)
5) Replace stubs with real callgraphs/traces and expand corpus after MVP passes CI. (TODO)
## Next Work (post-MVP)
- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.
Determinism rules
- Sort JSON keys; round scores to 2dp; UTC times only if needed.
- Stable ordering of files in manifests; hash with SHA-256.
- No network calls during test or generation.

View File

@@ -0,0 +1,416 @@
# Edge Explainability Schema
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
---
## 1. Overview
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
- **Confidence score:** Certainty of the edge's existence
- **Evidence sources:** Detectors and rules that contributed to edge discovery
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
---
## 2. Gap Resolutions
### EG1: Reason Enum Governance
**Standard reason codes:**
| Code | Category | Description | Example |
|------|----------|-------------|---------|
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
| `bytecode-field` | Static | Field access leading to call | Static initializer |
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
| `user-annotated` | Manual | User-provided edge | Policy override |
**Governance rules:**
1. New reason codes require RFC + review by Scanner Guild
2. Deprecated codes remain valid for 2 major versions
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
4. Codes are case-insensitive, normalized to lowercase
**Code registry:**
```json
{
"schema": "stellaops.edge.reason.registry@v1",
"version": "2025-12-13",
"reasons": [
{
"code": "bytecode-invoke",
"category": "static",
"description": "Bytecode invocation instruction",
"languages": ["java", "dotnet"],
"confidence_range": [0.9, 1.0],
"deprecated": false
}
]
}
```
### EG2: Canonical Edge Schema with Hash Rules
**Edge schema:**
```json
{
"edge_id": "edge:sha256:{hex}",
"from": "sym:java:...",
"to": "sym:java:...",
"kind": "call",
"reason": "bytecode-invoke",
"confidence": 0.95,
"evidence": [
{
"source": "detector:java-bytecode-analyzer",
"rule_id": "invoke-virtual",
"rule_version": "1.0.0",
"location": {
"file": "com/example/Foo.class",
"offset": 1234,
"instruction": "invokevirtual #42"
},
"timestamp": "2025-12-13T10:00:00Z"
}
],
"attributes": {
"virtual": true,
"polymorphic_targets": 3
}
}
```
**Hash computation:**
```
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason
})
)
```
**Canonicalization:**
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
2. Sort JSON keys alphabetically
3. No whitespace, UTF-8 encoding
4. Hash is lowercase hex with `sha256:` prefix
### EG3: Evidence Limits/Redaction
**Evidence limits:**
| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Evidence entries per edge | 10 | Yes |
| Location detail fields | 5 | Yes |
| Instruction preview length | 100 chars | Yes |
| File path depth | 10 segments | No |
**Redaction rules:**
| Category | Redaction | Example |
|----------|-----------|---------|
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
| Bytecode offsets | Keep | Offsets are not PII |
| Instruction text | Truncate | First 100 chars |
| Source line content | Omit | Not included by default |
**Truncation behavior:**
```json
{
"evidence_truncated": true,
"evidence_count": 15,
"evidence_shown": 10,
"full_evidence_uri": "cas://edges/evidence/sha256:..."
}
```
### EG4: Confidence Rubric
**Confidence scale:**
| Level | Range | Description | Typical Sources |
|-------|-------|-------------|-----------------|
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
| `low` | 0.2-0.49 | Possible | Heuristic carving |
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
**Confidence computation:**
```
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
```
**Base confidence by reason:**
| Reason | Base Confidence |
|--------|-----------------|
| `bytecode-invoke` | 0.98 |
| `import-symbol` | 0.95 |
| `plt-stub` | 0.92 |
| `reloc-target` | 0.90 |
| `init-array` | 0.95 |
| `vtable-slot` | 0.75 |
| `indirect-target` | 0.60 |
| `reflection-invoke` | 0.50 |
| `runtime-observed` | 0.99 |
| `user-annotated` | 0.80 |
### EG5: Detector/Rule Provenance
**Provenance schema:**
```json
{
"provenance": {
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"digest": "sha256:..."
},
"detector": {
"name": "java-bytecode-analyzer",
"version": "2.0.0",
"rule_set": "default"
},
"rule": {
"id": "invoke-virtual",
"version": "1.0.0",
"description": "Detect invokevirtual bytecode instructions"
},
"input_artifacts": [
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
],
"detected_at": "2025-12-13T10:00:00Z"
}
}
```
**Provenance requirements:**
1. All edges must include analyzer provenance
2. Detector/rule provenance required for non-runtime edges
3. Input artifact digests enable reproducibility
4. Detection timestamp uses UTC ISO-8601
### EG6: API/CLI Parity
**API endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/edges/{edgeId}` | Get edge details |
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
| `POST` | `/api/edges/search` | Search edges by criteria |
**CLI commands:**
```bash
# List edges for a graph
stella edge list --graph blake3:a1b2c3d4...
# Get edge details
stella edge show --id edge:sha256:...
# Search edges
stella edge search --from "sym:java:..." --reason bytecode-invoke
# Export edges
stella edge export --graph blake3:... --output ./edges.ndjson
```
**Output parity:**
- API and CLI return identical JSON structure
- CLI supports `--json` for machine-readable output
- Both support filtering by reason, confidence, from/to
### EG7: Deterministic Fixtures
**Fixture location:**
```
tests/Edge/
fixtures/
bytecode-invoke.json
plt-stub.json
vtable-dispatch.json
init-array-constructor.json
runtime-observed.json
golden/
bytecode-invoke.golden.json
graph-with-edges.golden.json
datasets/edges/
schema/
edge.schema.json
reason-registry.json
samples/
java-spring-boot/
edges.ndjson
expected-hashes.txt
```
**Fixture requirements:**
1. Each reason code has at least one fixture
2. Fixtures include expected `edge_id` hash
3. Golden outputs frozen after review
4. CI verifies hash stability
### EG8: Propagation into Explanation Graphs/VEX
**Explanation graph inclusion:**
```json
{
"explanation": {
"path": [
{
"node": "sym:java:main...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:handler...",
"reason": "bytecode-invoke",
"confidence": 0.98
}
},
{
"node": "sym:java:handler...",
"outgoing_edge": {
"edge_id": "edge:sha256:...",
"to": "sym:java:log4j...",
"reason": "bytecode-invoke",
"confidence": 0.95
}
}
],
"aggregate_path_confidence": 0.93
}
}
```
**VEX evidence format:**
```json
{
"stellaops:reachability": {
"path_edges": [
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
],
"weakest_edge": {
"edge_id": "edge:sha256:...",
"reason": "bytecode-invoke",
"confidence": 0.95
},
"aggregate_confidence": 0.93
}
}
```
### EG9: Localization Guidance
**Localizable elements:**
| Element | Localization | Example |
|---------|--------------|---------|
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
| Confidence level | Message catalog | `high` -> "High confidence" |
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
| Error messages | Message catalog | Standard error codes |
**Message catalog structure:**
```json
{
"locale": "en-US",
"messages": {
"edge.reason.bytecode-invoke": "Bytecode method call",
"edge.reason.plt-stub": "PLT/GOT library call",
"edge.confidence.high": "High confidence ({0:P0})",
"edge.evidence.location": "Detected at offset {offset} in {file}"
}
}
```
**Supported locales:**
- `en-US` (default)
- Additional locales via contribution
### EG10: Backfill Plan
**Backfill strategy:**
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
2. **Phase 2:** Run detector upgrade on graphs without reason codes
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
**Migration script:**
```bash
stella edge backfill --graph blake3:... --dry-run
# Output:
Graph: blake3:a1b2c3d4...
Edges without reason: 1234
Edges to update: 1234
Dry run - no changes made.
# Execute:
stella edge backfill --graph blake3:... --execute
```
**Backfill metadata:**
```json
{
"backfill": {
"status": "complete",
"original_analyzer_version": "1.0.0",
"backfill_analyzer_version": "1.2.0",
"backfilled_at": "2025-12-13T10:00:00Z",
"edges_updated": 1234
}
}
```
---
## 3. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Explainability Schema](./explainability-schema.md) - Explanation format
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
---
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._

View File

@@ -0,0 +1,454 @@
# Explainability Schema
_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._
This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.
---
## 1. Overview
Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:
- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
- **Confidence scores:** Per-rule and aggregate confidence values
- **Redaction metadata:** PII handling and data classification
---
## 2. Gap Resolutions
### EX1: Schema/Canonicalization + Hashes
**Explanation schema:**
```json
{
"schema": "stellaops.explanation@v1",
"explanation_id": "explain:sha256:{hex}",
"finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
"verdict": {
"status": "affected",
"severity": {"normalized": "Critical", "score": 10.0},
"confidence": 0.92
},
"decision_chain": [
{
"rule_id": "rule:reachability_gate",
"rule_version": "1.0.0",
"inputs": {
"reachability.state": "CR",
"reachability.confidence": 0.92
},
"output": {"allowed": true, "contribution": 0.4},
"evidence_refs": ["cas://reachability/graphs/blake3:..."]
},
{
"rule_id": "rule:severity_baseline",
"rule_version": "1.0.0",
"inputs": {
"cvss_base": 10.0,
"epss_percentile": 0.95
},
"output": {"severity": "Critical", "contribution": 0.6},
"evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
}
],
"aggregate_confidence": 0.88,
"created_at": "2025-12-13T10:00:00Z",
"policy_version": "sha256:...",
"graph_revision_id": "rev:blake3:..."
}
```
**Canonicalization rules:**
1. JSON keys sorted alphabetically at all levels
2. Arrays in `decision_chain` ordered by rule execution sequence
3. `evidence_refs` arrays sorted alphabetically
4. No whitespace, UTF-8 encoding
5. Hash computed over canonical JSON: `sha256(canonical_json)`
### EX2: DSSE Predicate/Signing Policy
**DSSE predicate type:**
```
stella.ops/explanation@v1
```
**Signing policy:**
| Element | Required | Signer |
|---------|----------|--------|
| Explanation body | Yes | Policy Engine key |
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |
**DSSE envelope structure:**
```json
{
"payloadType": "application/vnd.stellaops.explanation+json",
"payload": "<base64(canonical_explanation_json)>",
"signatures": [
{
"keyid": "policy-engine-signing-2025",
"sig": "base64:..."
}
]
}
```
**Signing requirements:**
- All explanations must be signed before CAS storage
- Signing key must be registered in Authority key store
- Key rotation triggers re-signing of active explanations (configurable)
### EX3: CAS Storage Rules for Evidence
**Storage layout:**
```
cas://explanations/
{sha256}/ # Explanation body
{sha256}.dsse # DSSE envelope
by-finding/{finding_id}/ # Index by finding
by-policy/{policy_digest}/ # Index by policy version
by-graph/{graph_revision_id}/ # Index by graph revision
```
**Storage rules:**
1. Explanations are immutable after signing
2. New verdicts create new explanation documents (no updates)
3. Previous explanations are retained per retention policy
4. Cross-references validated at write time (graphs, VEX must exist)
**Deduplication:**
- Identical canonical JSON produces identical hash
- CAS returns existing reference if content matches
### EX4: Link to Decision/Policy and graph_revision_id
**Required links:**
```json
{
"links": {
"policy_version": "sha256:7e1d...",
"policy_uri": "cas://policy/versions/sha256:7e1d...",
"graph_revision_id": "rev:blake3:a1b2...",
"graph_uri": "cas://reachability/revisions/blake3:a1b2...",
"sbom_digest": "sha256:def4...",
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
"vex_digest": "sha256:e5f6...",
"vex_uri": "cas://excititor/vex/openvex.json"
}
}
```
**Validation:**
- All linked artifacts must exist at explanation creation time
- Links are verified during replay/audit
- Broken links cause replay verification failure
### EX5: Export/Replay Bundle Format
**Export bundle manifest:**
```json
{
"schema": "stellaops.explanation.bundle@v1",
"bundle_id": "bundle:explain:2025-12-13",
"created_at": "2025-12-13T10:00:00Z",
"explanations": [
{
"explanation_id": "explain:sha256:...",
"finding_id": "...",
"explanation_uri": "explanations/sha256:....json",
"dsse_uri": "explanations/sha256:....dsse"
}
],
"dependencies": {
"graphs": [
{"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
],
"policies": [
{"digest": "sha256:...", "uri": "policies/sha256:....json"}
],
"vex_statements": [
{"digest": "sha256:...", "uri": "vex/sha256:....json"}
]
},
"verification": {
"bundle_hash": "sha256:...",
"signature": "base64:...",
"signed_by": "policy-engine-signing-2025"
}
}
```
**Replay verification:**
```bash
stella explain verify --bundle ./explanation-bundle.tgz
# Output:
Bundle: bundle:explain:2025-12-13
Explanations: 42
Dependencies: 5 graphs, 2 policies, 12 VEX
Verifying explanations...
Canonical hashes: 42/42 MATCH
DSSE signatures: 42/42 VALID
Dependency links: 42/42 RESOLVED
Replay verification PASSED.
```
### EX6: PII/Redaction Rules
**Redaction categories:**
| Category | Redaction | Example |
|----------|-----------|---------|
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |
**Redaction metadata:**
```json
{
"redaction": {
"applied": true,
"level": "standard",
"fields_redacted": ["actor.email", "evidence.file_path"],
"redaction_policy": "stellaops.redaction.standard@v1"
}
}
```
**Export modes:**
- `--redacted` (default): Apply standard redaction
- `--full`: Include all data (requires `explain:export:full` scope)
- `--audit`: Include redaction audit trail
### EX7: Size Budgets
**Limits:**
| Element | Default Limit | Configurable |
|---------|--------------|--------------|
| Explanation body | 256 KB | Yes |
| Decision chain entries | 100 | Yes |
| Evidence refs per rule | 20 | Yes |
| Total evidence refs | 200 | Yes |
| Path entries | 50 | No |
**Truncation behavior:**
When limits are exceeded:
1. Log warning with truncation details
2. Add `truncation` metadata to explanation
3. Store full evidence in separate CAS object
4. Include `full_evidence_uri` reference
```json
{
"truncation": {
"applied": true,
"elements_truncated": ["decision_chain", "evidence_refs"],
"full_evidence_uri": "cas://explanations/full/sha256:..."
}
}
```
### EX8: Versioning
**Schema versioning:**
- Schema version in `schema` field: `stellaops.explanation@v1`
- Breaking changes increment major version
- Minor changes (additive fields) use v1.x
- Backward compatibility maintained for 2 major versions
**Migration support:**
```bash
stella explain migrate --from v1 --to v2 --input ./explanations/
# Output:
Migrating 1000 explanations from v1 to v2...
Migrated: 998
Skipped (already v2): 2
Migration complete.
```
**Version compatibility matrix:**
| API Version | Schema v1 | Schema v2 |
|-------------|-----------|-----------|
| 1.0.x | Full | N/A |
| 1.1.x | Full | Full |
| 2.0.x | Read-only | Full |
### EX9: Golden Fixtures/Tests
**Test fixture location:**
```
tests/Explanation/
fixtures/
simple-affected.json
simple-not-affected.json
with-reachability-evidence.json
multi-rule-chain.json
truncated-evidence.json
redacted-pii.json
golden/
simple-affected.golden.json
simple-affected.golden.dsse
datasets/explanations/
schema/
explanation.schema.json
samples/
log4j-affected/
explanation.json
expected-hash.txt
```
**Test categories:**
1. **Canonicalization tests:** Verify hash stability across JSON reordering
2. **DSSE signing tests:** Verify signature creation and verification
3. **Redaction tests:** Verify PII handling
4. **Truncation tests:** Verify size budget enforcement
5. **Replay tests:** Verify bundle export/import cycle
6. **Migration tests:** Verify version upgrade paths
**CI integration:**
```yaml
# .gitea/workflows/explanation-tests.yml
explanation-tests:
runs-on: ubuntu-latest
steps:
- name: Run explanation tests
run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
- name: Verify golden fixtures
run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
```
### EX10: Determinism Guarantees
**Determinism requirements:**
1. Same inputs produce identical `explanation_id` hash
2. Decision chain ordering is stable (execution order)
3. Evidence refs sorted alphabetically
4. Timestamps use UTC ISO-8601 with millisecond precision
5. Floating-point values rounded to 6 decimal places
**Verification:**
```bash
# Run twice with same inputs, verify identical hashes
stella explain generate --finding "..." --output a.json
stella explain generate --finding "..." --output b.json
diff a.json b.json # Should be empty
# Or use built-in verify
stella explain verify-determinism --finding "..." --iterations 3
```
---
## 3. API Reference
### 3.1 Generate Explanation
```http
POST /api/policy/findings/{findingId}/explain
Authorization: Bearer <token>
Content-Type: application/json
{
"mode": "full",
"include_evidence": true,
"redaction_level": "standard"
}
```
### 3.2 Get Explanation
```http
GET /api/explanations/{explanationId}
Authorization: Bearer <token>
Accept: application/json
```
### 3.3 Export Explanation Bundle
```http
POST /api/explanations/export
Authorization: Bearer <token>
Content-Type: application/json
{
"finding_ids": ["...", "..."],
"include_dependencies": true,
"redaction_level": "standard"
}
```
### 3.4 Verify Explanation
```http
POST /api/explanations/{explanationId}/verify
Authorization: Bearer <token>
```
---
## 4. CLI Reference
```bash
# Generate explanation for a finding
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"
# Export explanation bundle
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz
# Verify explanation
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse
# Verify bundle
stella explain verify --bundle ./bundle.tgz
# Check determinism
stella explain verify-determinism --finding "..." --iterations 5
```
---
## 5. Related Documentation
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
- [Policy API](../api/policy.md) - Policy Engine REST API
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications
---
_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._

View File

@@ -1,175 +1,535 @@
# Function-Level Evidence Readiness (Nov 2025 Advisory)
# Function-Level Evidence Guide
_Last updated: 2025-11-12. Owner: Business Analysis Guild._
_Last updated: 2025-12-13. Owner: Docs Guild._
This memo captures the outstanding work required to make StellaOps scanners emit stable, function-level evidence that matches the November2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.
This guide documents the cross-module function-level evidence chain that enables provable reachability claims. It covers the schema, identifiers, API usage, CLI commands, and integration patterns for Scanner, Signals, Policy, and Replay.
---
## 1. Goal & Scope
## 1. Overview
**Goal.** Anchor every vulnerability finding to an immutable `{artifact_digest, code_id}` tuple plus optional symbol hints so replayers can prove reachability against stripped binaries.
StellaOps implements a **function-level evidence chain** that anchors every vulnerability finding to immutable identifiers (`code_id`, `symbol_id`, `graph_hash`) enabling:
**Scope.** Scanner analyzers, runtime ingestion, Signals scoring, Replay manifests, Policy/VEX emission, CLI/UI explainers, and documentation/runbooks needed to operationalise the advisory.
- **Provable reachability:** Deterministic call-path evidence from entry points to vulnerable functions.
- **Stripped binary support:** `code_id` + `code_block_hash` provides identity when symbols are absent.
- **Evidence replay:** Sealed artifacts with DSSE attestation allow offline verification.
- **Cross-module linking:** Scanner -> Signals -> Policy -> VEX -> UI/CLI evidence chain.
Out of scope: implementing disassemblers or symbol servers; those will be handled inside the module-specific backlog tasks referenced below.
### 1.1 Core Identifiers
| Identifier | Format | Purpose | Example |
|------------|--------|---------|---------|
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity | `sym:java:R3JlZXRpbmc...` |
| `code_id` | `code:{lang}:{base64url}` | Identity for name-less code blocks | `code:binary:YWJjZGVm...` |
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity | `blake3:a1b2c3d4e5f6...` |
| `symbol_digest` | `sha256:{hex}` | Hash of symbol_id for edge linking | `sha256:e5f6a7b8c9d0...` |
| `build_id` | `gnu-build-id:{hex}` | ELF/PE debug identifier | `gnu-build-id:5f0c7c3c...` |
### 1.2 Evidence Chain Flow
```
Scanner -> richgraph-v1 -> Signals -> Scoring -> Policy -> VEX -> UI/CLI
| | | | | | |
| | | | | | +-- stella graph explain
| | | | | +-- OpenVEX with call-path proofs
| | | | +-- Policy gates + reachability.state
| | | +-- Lattice state + confidence + riskScore
| | +-- Runtime facts + static paths
| +-- BLAKE3 graph_hash + DSSE attestation
+-- code_id, symbol_id, build_id per node
```
---
## 2. Advisory Requirements vs. System Gaps
## 2. Schema Reference
| Requirement | Current gap | Task references | Notes |
|-------------|-------------|-----------------|-------|
| Immutable code identity (`code_id` = `{format, build_id, start, length}` + optional `code_block_hash`) | Callgraph nodes are opaque strings with no address metadata. | Sprint401 `GRAPH-CAS-401-001`, `GAP-SCAN-001`, `GAP-SYM-007` | `code_id` should live alongside existing `SymbolID` helpers so analyzers can emit it without duplicating logic. |
| Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | `GAP-SYM-007` | Require deterministic casing + `symbol.source ∈ {DWARF,PDB,SYM,none}`. |
| Runtime facts mapped to code anchors | `/signals/runtime-facts` now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. | Sprint400 `ZASTAVA-REACH-201-001`, Sprint401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. |
| Replay/DSSE coverage | Replay manifests dont enforce hash/CAS registration for graphs/traces. | Sprint400 `REPLAY-REACH-201-005`, Sprint401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
| Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
| Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
| Build-id propagation | Build-id not consistently captured or threaded into `SymbolID`/`code_id`; SBOM/runtime joins are brittle. | Sprint401 `SCANNER-BUILDID-401-035` | Capture `.note.gnu.build-id`, include in code identity, expose in SBOM exports and runtime events. |
| Load-time constructors as roots | Graph roots omit `.preinit_array`/`.init_array`/`_init`, missing load-time edges. | Sprint401 `SCANNER-INITROOT-401-036` | Add synthetic roots with `phase=load`; include `DT_NEEDED` deps constructors. |
| PURL-resolved edges | Call edges do not carry `purl` or `symbol_digest`, slowing SBOM joins. | Sprint401 `GRAPH-PURL-401-034` | Annotate edges per `docs/reachability/purl-resolved-edges.md`; keep deterministic graph hash. |
| Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint0400 `SIGNALS-UNKNOWN-201-008` | Emit Unknowns records (see `docs/signals/unknowns-registry.md`) and feed `unknowns_pressure` into scoring. |
| Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint401 `QA-PORACLE-401-037` | Add paired vuln/fixed fixtures and expectations; wire to CI using `docs/reachability/patch-oracles.md`. |
### 2.1 SymbolID Construction
---
Per-language canonical tuple format (NUL-separated, then SHA-256 -> base64url):
## 3. Workstreams & Expectations
| Language | Tuple Components | Example |
|----------|------------------|---------|
| Java | `{package}\0{class}\0{method}\0{descriptor}` | `com.example\0Foo\0bar\0(Ljava/lang/String;)V` |
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | `MyApp\0Controllers\0UserController\0GetById(int)` |
| Go | `{module}\0{package}\0{receiver}\0{func}` | `github.com/user/repo\0handler\0*Server\0Handle` |
| Node | `{pkg_or_path}\0{export_path}\0{kind}` | `lodash\0get\0function` |
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | `sha256:abc...\0.text\00x401000\0ssl3_read\0global\0` |
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` | `requests\0api\0get` |
| Ruby | `{gem_or_path}\0{module}\0{method}` | `rails\0ActionController::Base\0render` |
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | `symfony/http-kernel\0Kernel\0handle` |
### 3.1 Scanner Symbolization (GAP-SCAN-001 / GAP-SYM-007)
### 2.2 CodeID Construction
* Define `SymbolID` helpers that glue together `{artifact_digest, file`, optional `section`, `addr`, `length`, `code_block_hash`}.
* Update analyzer contracts so every analyzer returns both `symbol_id` and `code_id`, with demangled names stored under the new `symbol` block.
* Persist the data into `richgraph-v1` payloads and attach CAS URIs via `StellaOps.Scanner.Reachability`.
* Deliver fixtures in `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` that prove determinism (same hash when analyzer flags reorder).
* **Helper status (2025-12-02):** `SymbolId.ForBinaryAddressed` + `CodeId.ForBinarySegment` now encode `{file_hash, section, addr, name, linkage, length, code_block_hash}` with normalized hex addresses. Analyzers should start emitting these tuples instead of ad-hoc hashes.
* **Binary lifter (2025-12-03):** `BinaryReachabilityLifter` emits richgraph nodes for ELF/PE/Mach-O using file SHA-256 + section/address tuples, attaches `code_id` anchors, and turns imports/load commands into `import` edges.
* **Schema wiring (2025-12-12):** `reachability-union` + `richgraph-v1` serializers now emit `symbol {mangled,demangled,source,confidence}` and optional `code_block_hash` for stripped blocks; confidence is clamped to `[0,1]` and `source` normalized to uppercase (`DWARF|PDB|SYM|NONE`).
For stripped binaries or name-less code blocks:
### 3.2 Runtime + Signals (GAP-ZAS-002 / GAP-SIG-003)
```
code:{lang}:{base64url_sha256(format + file_hash + addr + length + section + code_block_hash)}
```
* Extend Zastava Observer NDJSON schema to emit: `symbol_id`, `code_id`, `hit_count`, `observed_at`, `loader_base`, `process.buildId`.
* Implement `/signals/runtime-facts` ingestion (gzip + NDJSON) with CAS-backed storage under `cas://reachability/runtime/{sha256}`.
* Update `ReachabilityScoringService` to lattice states and include runtime evidence references plus CAS URIs in `ReachabilityFactDocument.Metadata`.
Example for stripped ELF:
```
code:binary:YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo
```
### 3.3 Replay & Evidence (GAP-REP-004)
### 2.3 Graph Node Schema
* Enforce CAS registration + BLAKE3 hashing before manifest writes (graphs and traces).
* Teach `ReachabilityReplayWriter` to require analyzer name/version, graph kind, `code_id` coverage summary.
* Update `docs/replay/DETERMINISTIC_REPLAY.md` once schema v2 is finalized.
### 3.4 Policy, VEX, CLI/UI (GAP-POL-005 / GAP-VEX-006)
* Policy Engine: ingest new reachability facts, expose `reachability.state`, `max_path_conf`, and `evidence.graph_hash` via SPL + API.
* CLI/UI: add `stella graph explain` and explain drawer showing call path (`SymbolID` list), code anchors, runtime hits, DSSE references.
* Notify templates: include short evidence summary (first hop + truncated `code_id`).
### 3.5 Documentation & Samples (GAP-DOC-008)
* Publish schema diffs in `docs/data/evidence-schema.md` (new file) covering SBOM evidence nodes, runtime NDJSON, and API responses.
* Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains.
* Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays.
### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
* Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that:
* parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
* emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and
* attach analyzer + toolchain identifiers consumed by `richgraph-v1`.
* Introduce `Reachability.Store` collections in Mongo:
* `func_nodes` keyed by `func:<format>:<sha256>:<va>` with `{binDigest,name,addr,size,lang,confidence,sym}`.
* `call_edges` `{from,to,kind,confidence,evidence[]}` linking internal/external nodes.
* `cve_func_hits` `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment.
* Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
---
## 4. Schema & API Touchpoints
Authoritative field list lives in `docs/reachability/evidence-schema.md`; use it for DTOs and CAS writers.
The next implementation pass must cover the following documents/files (create them if missing):
1. `docs/data/evidence-schema.md` authoritative schema for `{code_id, symbol, tool}` blocks.
2. `docs/runbooks/reachability-runtime.md` operator steps for staging runtime ingestion bundles, retention, and troubleshooting.
3. `docs/runbooks/replay_ops.md` add section detailing replay verification using the new graph/runtime CAS entries.
API contracts to amend:
- `POST /signals/callgraphs` response includes `graphHash` (sha256) for the normalized callgraph; richgraph-v1 uses BLAKE3 for graph CAS hashes.
- `POST /signals/runtime-facts` request body schema (NDJSON) with `symbol_id`, `code_id`, `hit_count`, `loader_base`.
- `GET /policy/findings` payload must surface `reachability.evidence[]` objects.
### 4.1 Signals runtime ingestion snapshot (Nov 2025)
- `/signals/runtime-facts` (JSON) and `/signals/runtime-facts/ndjson` (streaming, optional gzip) accept the following event fields:
- `symbolId` (required), `codeId`, `loaderBase`, `hitCount`, `processId`, `processName`, `socketAddress`, `containerId`, `evidenceUri`, `metadata`.
- Subject context (`scanId` / `imageDigest` / `component` / `version`) plus `callgraphId` is supplied either in the JSON body or as query params for the NDJSON endpoint.
- Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
- Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
### 4.2 Reachability store layout (SIG-STORE-401-016)
All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
Each node in a richgraph-v1 document includes:
```json
// func_nodes
{
"_id": "func:ELF:sha256:4012a0",
"binDigest": "sha256:deadbeef...",
"name": "ssl3_read_bytes",
"addr": "0x4012a0",
"size": 312,
"lang": "c",
"confidence": 0.92,
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
"sym": "present"
}
// call_edges
{
"from": "func:ELF:sha256:4012a0",
"to": "func:ELF:sha256:40f0ff",
"kind": "static",
"confidence": 0.88,
"evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
}
// cve_func_hits
{
"cve": "CVE-2023-XXXX",
"purl": "pkg:generic/openssl@1.1.1u",
"func_id": "func:ELF:sha256:4012a0",
"match": "name+version",
"confidence": 0.77,
"source": "concelier:openssl-advisory"
"id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
"symbol_id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
"code_id": "code:java:...",
"lang": "java",
"kind": "method",
"display": "com.example.GreetingService.greet(String)",
"purl": "pkg:maven/com.example/greeting-service@1.0.0",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_digest": "sha256:e5f6a7b8...",
"code_block_hash": "sha256:deadbeef...",
"symbol": {
"mangled": null,
"demangled": "com.example.GreetingService.greet(String)",
"source": "DWARF",
"confidence": 0.98
},
"evidence": ["import", "bytecode"],
"attributes": {}
}
```
Writers **must**:
### 2.4 Graph Edge Schema
1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable.
2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing.
3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance.
Edges carry callee `purl` and `symbol_digest` for SBOM correlation:
```json
{
"from": "sym:java:caller...",
"to": "sym:java:callee...",
"kind": "call",
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
"symbol_digest": "sha256:f1e2d3c4...",
"confidence": 0.92,
"evidence": ["bytecode", "import"],
"candidates": []
}
```
### 2.5 Evidence Block Schema
Evidence blocks in Policy/VEX responses cite all relevant identifiers:
```json
{
"evidence": {
"graph_hash": "blake3:a1b2c3d4e5f6...",
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
"path": [
{"symbol_id": "sym:java:...", "display": "main()"},
{"symbol_id": "sym:java:...", "display": "processRequest()"},
{"symbol_id": "sym:java:...", "display": "log4j.error()"}
],
"path_length": 3,
"confidence": 0.85,
"runtime_hits": ["probe:jfr:1234"],
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"toolchain_digest": "sha256:..."
}
}
}
```
---
## 5. Test & Fixture Expectations
## 3. API Usage
- **Reachbench fixtures**: update golden cases with `code_id` + `symbol` metadata. Ensure both reachable/unreachable variants still pass once graphs contain the richer IDs.
- **Signals unit tests**: add deterministic tests for lattice scoring + runtime evidence linking (`tests/reachability/StellaOps.Signals.Reachability.Tests`).
- **Replay tests**: extend `tests/reachability/StellaOps.Replay.Core.Tests` to assert manifest v2 serialization and hash enforcement.
### 3.1 Signals Callgraph Ingestion
All fixtures must remain deterministic: sort nodes/edges, normalise casing, and freeze timestamps in test data.
Submit a callgraph and receive a deterministic `graph_hash`:
```http
POST /signals/callgraphs
Authorization: Bearer <token>
Content-Type: application/json
{
"schema": "richgraph-v1",
"analyzer": {"name": "scanner.java", "version": "1.2.0"},
"nodes": [...],
"edges": [...],
"roots": [...]
}
```
**Response:**
```json
{
"graphHash": "blake3:a1b2c3d4e5f6...",
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
"nodeCount": 1247,
"edgeCount": 3891
}
```
### 3.2 Signals Runtime Facts
Submit runtime observations with `code_id` anchors:
```http
POST /signals/runtime-facts/ndjson?scanId=scan-123&imageDigest=sha256:abc123
Authorization: Bearer <token>
Content-Type: application/x-ndjson
Content-Encoding: gzip
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":47,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:00Z"}
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":12,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:01Z"}
```
**Response:**
```json
{
"accepted": 128,
"duplicates": 2,
"evidenceUri": "cas://reachability/runtime/sha256:xyz789..."
}
```
### 3.3 Fetch Reachability Facts
Query reachability state for a subject:
```http
GET /signals/facts/{subjectKey}
Authorization: Bearer <token>
```
**Response:**
```json
{
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
"metadata": {
"fact": {
"digest": "sha256:abc123...",
"version": 3
}
},
"states": [
{
"symbol": "sym:java:...",
"latticeState": "CR",
"bucket": "runtime",
"confidence": 0.92,
"score": 0.78,
"path": ["sym:java:main...", "sym:java:process...", "sym:java:log4j..."],
"evidence": {
"static": {"graphHash": "blake3:...", "pathLength": 3, "confidence": 0.85},
"runtime": {"probeId": "probe:jfr:1234", "hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
}
}
],
"score": 0.78,
"aggregateTier": "T2",
"riskScore": 0.65
}
```
### 3.4 Policy Findings with Reachability Evidence
```http
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
Authorization: Bearer <token>
```
**Response (excerpt):**
```json
{
"findingId": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
"reachability": {
"state": "CR",
"confidence": 0.92,
"evidence": {
"graph_hash": "blake3:a1b2c3d4...",
"path": [
{"symbol_id": "sym:java:...", "display": "main()"},
{"symbol_id": "sym:java:...", "display": "Logger.error()"}
],
"runtime_hits": 47,
"fact_digest": "sha256:abc123..."
}
},
"steps": [
{"rule": "reachability_gate", "state": "CR", "allowed": true},
{"rule": "severity_baseline", "severity": {"normalized": "Critical", "score": 10.0}}
]
}
```
---
## 6. Handoff Checklist for the Next Agent
## 4. CLI Usage
1. Confirm sprint entries (`SPRINT_400` and `SPRINT_401`) remain in sync when moving `GAP-*` tasks to DOING/DONE.
2. Start with `GAP-SYM-007` (schema/helper implementation) because downstream work depends on the new `code_id` payload shape.
3. Once schema PR merges, coordinate with Signals + Policy guilds to align on CAS naming and DSSE predicates before wiring APIs.
4. Update the docs listed in §4 as each component lands; keep this file current with statuses and links to PRs/ADRs.
5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.
### 4.1 Graph Explain Command
Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.
View the call path and evidence for a finding:
```bash
stella graph explain --finding "pkg:maven/log4j@2.14.1:CVE-2021-44228" --scan-id scan-123
# Output:
Finding: CVE-2021-44228 in pkg:maven/log4j@2.14.1
Reachability: CONFIRMED_REACHABLE (CR)
Confidence: 0.92
Graph Hash: blake3:a1b2c3d4e5f6...
Call Path (3 hops):
1. main() [sym:java:R3JlZXRpbmcuLi4=]
-> processRequest() [direct call]
2. processRequest() [sym:java:cHJvY2Vzcy4uLg==]
-> Logger.error() [virtual call]
3. Logger.error() [sym:java:bG9nNGouLi4=]
[VULNERABLE: CVE-2021-44228]
Runtime Evidence:
- JFR probe hit: 47 times
- Last observed: 2025-12-13T10:00:00Z
DSSE Attestation: cas://reachability/graphs/a1b2c3d4....dsse
```
### 4.2 Graph Export Command
Export a reachability graph for offline analysis:
```bash
stella graph export --scan-id scan-123 --output ./evidence-bundle/
# Creates:
# ./evidence-bundle/richgraph-v1.json # Canonical graph
# ./evidence-bundle/richgraph-v1.json.dsse # DSSE envelope
# ./evidence-bundle/meta.json # Metadata
# ./evidence-bundle/runtime-facts.ndjson # Runtime observations
```
### 4.3 Graph Verify Command
Verify a graph's DSSE signature and Rekor inclusion:
```bash
stella graph verify --graph ./evidence-bundle/richgraph-v1.json \
--dsse ./evidence-bundle/richgraph-v1.json.dsse \
--rekor-log
# Output:
Graph Hash: blake3:a1b2c3d4e5f6...
DSSE Signature: VALID (key: scanner-signing-2025)
Rekor Entry: 12345678 (verified)
Timestamp: 2025-12-13T09:30:00Z
```
---
## 5. OpenVEX Integration
### 5.1 OpenVEX with Reachability Evidence
When Policy emits VEX decisions, reachability evidence is included:
```json
{
"@context": "https://openvex.dev/ns/v0.2.0",
"@id": "https://stellaops.example/vex/2025-12-13/001",
"author": "StellaOps Policy Engine",
"timestamp": "2025-12-13T10:00:00Z",
"version": 1,
"statements": [
{
"vulnerability": {"@id": "CVE-2021-44228"},
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
"status": "affected",
"justification": "vulnerable_code_in_container",
"impact_statement": "Vulnerable Log4j method reachable from main entry point.",
"action_statement": "Upgrade to log4j 2.17.1 or later.",
"stellaops:reachability": {
"state": "CR",
"confidence": 0.92,
"graph_hash": "blake3:a1b2c3d4e5f6...",
"path_length": 3,
"evidence_uri": "cas://reachability/graphs/a1b2c3d4..."
}
}
]
}
```
### 5.2 VEX "not_affected" with Unreachability Evidence
When code is provably unreachable:
```json
{
"statements": [
{
"vulnerability": {"@id": "CVE-2023-XXXXX"},
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
"status": "not_affected",
"justification": "vulnerable_code_not_in_execute_path",
"impact_statement": "Vulnerable function not reachable from any entry point.",
"stellaops:reachability": {
"state": "CU",
"confidence": 0.88,
"graph_hash": "blake3:d4e5f6a7b8c9...",
"evidence_uri": "cas://reachability/graphs/d4e5f6a7b8c9...",
"runtime_observation_window": "72h",
"runtime_hits": 0
}
}
]
}
```
---
## 6. Replay Manifest v2
### 6.1 Manifest Structure
Replay manifests now enforce BLAKE3 hashing and CAS registration:
```json
{
"schema": "stellaops.replay.manifest@v2",
"subject": "scan:123",
"generatedAt": "2025-12-13T10:00:00Z",
"hashAlg": "blake3",
"artifacts": [
{
"kind": "richgraph",
"uri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6...",
"hash": "blake3:a1b2c3d4e5f6...",
"dsseUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6....dsse"
},
{
"kind": "runtime-facts",
"uri": "cas://reachability/runtime/sha256:xyz789...",
"hash": "sha256:xyz789..."
},
{
"kind": "sbom",
"uri": "cas://scanner-artifacts/sbom.cdx.json",
"hash": "sha256:def456..."
}
],
"analyzer": {
"name": "scanner.java",
"version": "1.2.0",
"toolchain_digest": "sha256:..."
},
"code_id_coverage": {
"total_symbols": 1247,
"with_code_id": 1189,
"coverage_pct": 95.3
}
}
```
### 6.2 Determinism Verification
Replay a manifest to verify determinism:
```bash
stella replay verify --manifest ./manifest.json --sealed
# Output:
Manifest: stellaops.replay.manifest@v2
Subject: scan:123
Artifacts: 3
Verifying richgraph...
Computed: blake3:a1b2c3d4e5f6...
Expected: blake3:a1b2c3d4e5f6...
Status: MATCH
Verifying runtime-facts...
Computed: sha256:xyz789...
Expected: sha256:xyz789...
Status: MATCH
Verifying sbom...
Computed: sha256:def456...
Expected: sha256:def456...
Status: MATCH
All artifacts verified. Determinism check PASSED.
```
---
## 7. Module Integration Guide
### 7.1 Scanner -> Signals
Scanner emits richgraph-v1 with `code_id` and `symbol_id`:
1. Scanner analyzes container/artifact
2. Callgraph generators emit nodes with `symbol_id`, `code_id`, `build_id`
3. RichGraphWriter canonicalizes (sorted arrays/keys) and computes `graph_hash` (BLAKE3)
4. DSSE signer wraps canonical JSON
5. CAS store persists body + envelope
6. Signals ingestion API receives URI reference
### 7.2 Signals -> Policy
Signals provides reachability facts to Policy:
1. Policy queries `/signals/facts/{subjectKey}`
2. Response includes `metadata.fact.digest`, `states[]`, `score`
3. Policy gates check `latticeState` (U, SR, SU, RO, RU, CR, CU, X)
4. Evidence blocks in findings reference `graph_hash`, `path[]`, `runtime_hits[]`
### 7.3 Policy -> VEX/UI
Policy emits OpenVEX with evidence:
1. VexDecisionEmitter serializes OpenVEX with `stellaops:reachability` extension
2. UI explain drawer fetches evidence via `/api/policy/findings/{id}/explain`
3. CLI `stella graph explain` renders call path and attestation refs
---
## 8. CAS Layout Reference
```
cas://reachability/
graphs/
{blake3}/ # Graph body (canonical JSON)
{blake3}.dsse # DSSE envelope
edges/
{graph_hash}/{bundle_id} # Edge bundle body (optional)
{graph_hash}/{bundle_id}.dsse
runtime/
{sha256}/ # Runtime facts NDJSON
```
---
## 9. Related Documentation
- [Reachability Lattice Model](./lattice.md) - State definitions and join rules
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Schema specification
- [Evidence Schema](./evidence-schema.md) - Detailed field definitions
- [Signals API Contract](../api/signals/reachability-contract.md) - API reference
- [Policy Gates](./policy-gate.md) - Gate configuration
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
- [Ground Truth Schema](./ground-truth-schema.md) - Test fixture format
---
_Last updated: 2025-12-13. See Sprint 0401 GAP-DOC-008 for change history._

View File

@@ -0,0 +1,377 @@
# Graph Revision Schema
_Last updated: 2025-12-13. Owner: Platform Guild._
This document defines the graph revision schema addressing gaps GR1-GR10 from the November 2025 product findings. It specifies manifest structure, hash algorithms, storage layout, lineage tracking, and governance rules for deterministic, auditable reachability graphs.
---
## 1. Overview
Graph revisions provide content-addressable, append-only versioning for `richgraph-v1` documents. Every graph mutation produces a new immutable revision with:
- **Deterministic hash:** BLAKE3-256 of canonical JSON
- **Lineage metadata:** Parent revision + diff summary
- **Cross-artifact digests:** Links to SBOM, VEX, policy, and tool versions
- **Audit trail:** Timestamp, actor, tenant, and operation type
---
## 2. Gap Resolutions
### GR1: Manifest Schema + Canonical Hash Rules
**Manifest schema:**
```json
{
"schema": "stellaops.graph.revision@v1",
"revision_id": "rev:blake3:a1b2c3d4e5f6...",
"graph_hash": "blake3:a1b2c3d4e5f6...",
"parent_revision_id": "rev:blake3:9f8e7d6c5b4a...",
"created_at": "2025-12-13T10:00:00Z",
"created_by": "service:scanner",
"tenant_id": "tenant:acme",
"shard_id": "shard:01",
"operation": "create",
"lineage": {
"depth": 3,
"root_revision_id": "rev:blake3:1a2b3c4d5e6f..."
},
"cross_artifacts": {
"sbom_digest": "sha256:...",
"vex_digest": "sha256:...",
"policy_digest": "sha256:...",
"analyzer_digest": "sha256:..."
},
"diff_summary": {
"nodes_added": 12,
"nodes_removed": 3,
"edges_added": 24,
"edges_removed": 8,
"roots_changed": false
}
}
```
**Canonical hash rules:**
1. JSON keys sorted alphabetically at all nesting levels
2. No whitespace/indentation (compact JSON)
3. UTF-8 encoding, no BOM
4. Arrays sorted by deterministic key (nodes by `id`, edges by `from,to,kind`)
5. Null/empty values omitted
6. Numeric values without trailing zeros
### GR2: Mandated BLAKE3-256 Encoding
All graph-level hashes use BLAKE3-256 with the following format:
```
blake3:{64_hex_chars}
```
Example:
```
blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
```
**Rationale:**
- BLAKE3 is 3x+ faster than SHA-256 on modern CPUs
- Parallelizable for large graphs (>100K nodes)
- Cryptographically secure (256-bit security)
- Algorithm prefix enables future migration
### GR3: Append-Only Storage
Graph revisions are immutable. Operations:
| Operation | Creates New Revision | Modifies Existing |
|-----------|---------------------|-------------------|
| `create` | Yes | No |
| `update` | Yes | No |
| `merge` | Yes | No |
| `tombstone` | Yes | No |
| `read` | No | No |
**Storage layout:**
```
cas://reachability/
revisions/
{blake3}/ # Revision manifest
{blake3}.graph # Graph body
{blake3}.dsse # DSSE envelope
indices/
by-tenant/{tenant_id}/ # Tenant index
by-sbom/{sbom_digest}/ # SBOM correlation
by-root/{root_revision_id}/ # Lineage tree
```
### GR4: Lineage/Diff Metadata
Every revision tracks its lineage:
```json
{
"lineage": {
"depth": 5,
"root_revision_id": "rev:blake3:...",
"parent_revision_id": "rev:blake3:...",
"merge_parents": []
},
"diff_summary": {
"nodes_added": 12,
"nodes_removed": 3,
"nodes_modified": 0,
"edges_added": 24,
"edges_removed": 8,
"edges_modified": 0,
"roots_added": 0,
"roots_removed": 0
},
"diff_detail_uri": "cas://reachability/diffs/{parent_hash}_{child_hash}.ndjson"
}
```
**Diff detail format (NDJSON):**
```ndjson
{"op":"add","path":"nodes","value":{"id":"sym:java:...","display":"..."}}
{"op":"remove","path":"edges","from":"sym:java:a","to":"sym:java:b"}
```
### GR5: Cross-Artifact Digests (SBOM/VEX/Policy/Tool)
Every revision links to related artifacts:
```json
{
"cross_artifacts": {
"sbom_digest": "sha256:...",
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
"sbom_format": "cyclonedx-1.6",
"vex_digest": "sha256:...",
"vex_uri": "cas://excititor/vex/openvex.json",
"policy_digest": "sha256:...",
"policy_version": "P-7:v4",
"analyzer_digest": "sha256:...",
"analyzer_name": "scanner.java",
"analyzer_version": "1.2.0"
}
}
```
### GR6: UI/CLI Surfacing of Full/Short IDs
**Full ID format:**
```
rev:blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
```
**Short ID format (for display):**
```
rev:a1b2c3d4
```
**CLI commands:**
```bash
# List revisions
stella graph revisions --scan-id scan-123
# Show full ID
stella graph revisions --scan-id scan-123 --full
# Output:
REVISION CREATED NODES EDGES PARENT
rev:a1b2c3d4 2025-12-13T10:00:00 1247 3891 rev:9f8e7d6c
rev:9f8e7d6c 2025-12-12T15:30:00 1235 3867 rev:1a2b3c4d
```
**UI display:**
- Revision chips show short ID with copy-to-clipboard for full ID
- Hover tooltip shows full ID and creation timestamp
- Lineage tree visualization available in "Revision History" drawer
### GR7: Shard/Tenant Context
Every revision includes partition context:
```json
{
"tenant_id": "tenant:acme",
"shard_id": "shard:01",
"namespace": "prod",
"workspace_id": "ws:default"
}
```
**Tenant isolation:**
- Revisions are tenant-scoped; cross-tenant access requires explicit grants
- Shard ID enables horizontal scaling and data locality
- Namespace supports multi-environment deployments
### GR8: Pin/Audit Governance
**Pinned revisions:**
Revisions can be pinned to prevent automatic retention cleanup:
```json
{
"pinned": true,
"pinned_at": "2025-12-13T10:00:00Z",
"pinned_by": "user:alice",
"pin_reason": "Audit retention for CVE-2021-44228 investigation",
"pin_expires_at": "2026-12-13T10:00:00Z"
}
```
**Audit events:**
All revision operations emit audit events:
```json
{
"event_type": "graph.revision.created",
"revision_id": "rev:blake3:...",
"actor": "service:scanner",
"tenant_id": "tenant:acme",
"timestamp": "2025-12-13T10:00:00Z",
"metadata": {
"operation": "create",
"parent_revision_id": "rev:blake3:...",
"graph_hash": "blake3:..."
}
}
```
### GR9: Retention/Tombstones
**Retention policy:**
| Category | Default Retention | Configurable |
|----------|-------------------|--------------|
| Latest revision | Forever | No |
| Intermediate revisions | 90 days | Yes |
| Tombstoned revisions | 30 days | Yes |
| Pinned revisions | Until unpin + 7 days | No |
**Tombstone format:**
```json
{
"schema": "stellaops.graph.revision@v1",
"revision_id": "rev:blake3:...",
"tombstone": true,
"tombstoned_at": "2025-12-13T10:00:00Z",
"tombstoned_by": "service:retention-worker",
"tombstone_reason": "retention_policy",
"successor_revision_id": "rev:blake3:..."
}
```
### GR10: Inclusion in Offline Kits
Offline kits include graph revisions for air-gapped deployments:
**Offline bundle manifest:**
```json
{
"schema": "stellaops.offline.bundle@v1",
"bundle_id": "bundle:2025-12-13",
"graph_revisions": [
{
"revision_id": "rev:blake3:...",
"graph_hash": "blake3:...",
"included_artifacts": ["graph", "dsse", "diff"]
}
],
"rekor_checkpoints": [
{
"log_id": "rekor.sigstore.dev",
"checkpoint": "...",
"verified_at": "2025-12-13T10:00:00Z"
}
],
"signature": {
"algorithm": "ecdsa-p256",
"value": "base64:...",
"public_key_id": "key:offline-signing-2025"
}
}
```
**Import verification:**
```bash
stella offline import --bundle ./offline-bundle.tgz --verify
# Output:
Bundle: bundle:2025-12-13
Graph Revisions: 5
Rekor Checkpoints: 2
Verifying signatures...
Bundle signature: VALID
DSSE envelopes: 5/5 VALID
Rekor checkpoints: 2/2 VERIFIED
Import complete.
```
---
## 3. API Reference
### 3.1 Create Revision
```http
POST /api/graph/revisions
Authorization: Bearer <token>
Content-Type: application/json
{
"graph": { ... richgraph-v1 ... },
"parent_revision_id": "rev:blake3:...",
"cross_artifacts": { ... }
}
```
### 3.2 Get Revision
```http
GET /api/graph/revisions/{revision_id}
Authorization: Bearer <token>
```
### 3.3 List Revisions
```http
GET /api/graph/revisions?tenant_id=acme&sbom_digest=sha256:...&limit=20
Authorization: Bearer <token>
```
### 3.4 Diff Revisions
```http
GET /api/graph/revisions/diff?from={rev_a}&to={rev_b}
Authorization: Bearer <token>
```
---
## 4. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [CAS Infrastructure](../contracts/cas-infrastructure.md) - Content-addressable storage
- [Offline Kit](../24_OFFLINE_KIT.md) - Air-gap deployment
---
_Last updated: 2025-12-13. See Sprint 0401 GRAPHREV-GAPS-401-063 for change history._

View File

@@ -84,7 +84,93 @@ Stella Ops provides **true hybrid reachability** by combining:
**Evidence linking:** Each edge in the graph or bundle includes `evidenceRefs` pointing to the underlying proof artifacts (static analysis artifacts, runtime traces), enabling **evidence-linked VEX decisions**.
## 8. Open decisions (tracked in Sprint 0401 tasks 5356)
- Rekor publish defaults per deployment tier (regulated vs standard).
- CLI UX for selective bundle verification.
- Bench coverage for edge-bundle verification time/size.
## 8. Decisions (Frozen 2025-12-13)
### 8.1 DSSE/Rekor Budget by Deployment Tier
| Tier | Graph DSSE | Edge-Bundle DSSE | Rekor Publish | Max Bundles/Graph |
|------|------------|------------------|---------------|-------------------|
| **Regulated** (SOC2, FedRAMP, PCI) | Required | Required for runtime/contested | Required | 10 |
| **Standard** | Required | Optional (criteria-based) | Graph only | 5 |
| **Air-gapped** | Required | Optional | Offline checkpoint | 5 |
| **Dev/Test** | Optional | Optional | Disabled | Unlimited |
**Budget enforcement:**
- Graph DSSE: Always submit digest to Rekor (or offline checkpoint for air-gapped)
- Edge-bundle DSSE: Submit to Rekor only when `bundle_reason` is `disputed`, `runtime-hit`, or `security-critical`
- Cap enforced by `reachability.edgeBundles.maxRekorPublishes` config (per tier defaults above)
### 8.2 Signing Layout and CAS Paths
```
cas://reachability/
graphs/
{blake3}/ # richgraph-v1 body (JSON)
{blake3}.dsse # Graph DSSE envelope
{blake3}.rekor # Rekor inclusion proof (optional)
edges/
{graph_hash}/
{bundle_id}.json # Edge bundle body
{bundle_id}.dsse # Edge bundle DSSE envelope
{bundle_id}.rekor # Rekor inclusion proof (if published)
revisions/
{revision_id}/ # Revision manifest + lineage
```
**Signing workflow:**
1. Canonicalize richgraph-v1 JSON (sorted keys, arrays by deterministic key)
2. Compute BLAKE3-256 hash -> `graph_hash`
3. Create DSSE envelope with `stella.ops/graph@v1` predicate
4. Submit digest to Rekor (online) or cache checkpoint (offline)
5. Store graph body + envelope + proof in CAS
### 8.3 CLI UX for Selective Bundle Verification
```bash
# Verify graph DSSE only (default)
stella graph verify --hash blake3:a1b2c3d4...
# Verify graph + all edge bundles
stella graph verify --hash blake3:a1b2c3d4... --include-bundles
# Verify specific edge bundle
stella graph verify --hash blake3:a1b2c3d4... --bundle bundle:001
# Offline verification with local CAS
stella graph verify --hash blake3:a1b2c3d4... --cas-root ./offline-cas/
# Verify Rekor inclusion
stella graph verify --hash blake3:a1b2c3d4... --rekor-proof
# Output formats
stella graph verify --hash blake3:a1b2c3d4... --format json|table|summary
```
### 8.4 Golden Fixture Plan
**Fixture location:** `tests/Reachability/Hybrid/`
**Required fixtures:**
| Fixture | Description | Expected Verification Time |
|---------|-------------|---------------------------|
| `graph-only.golden.json` | Minimal richgraph-v1 with DSSE | < 100ms |
| `graph-with-runtime.golden.json` | Graph + 1 runtime edge bundle | < 200ms |
| `graph-with-contested.golden.json` | Graph + 1 contested/revoked edge bundle | < 200ms |
| `large-graph.golden.json` | 10K nodes, 50K edges, 5 bundles | < 2s |
| `offline-bundle.golden.tgz` | Complete offline replay pack | < 5s |
**CI integration:**
- `.gitea/workflows/hybrid-attestation.yml` runs verification fixtures
- Size gate: Graph body < 10MB, individual bundle < 1MB
- Time gate: Full verification < 5s for standard tier
### 8.5 Implementation Status
| Component | Status | Notes |
|-----------|--------|-------|
| Graph DSSE predicate | Done | `stella.ops/graph@v1` in PredicateTypes.cs |
| Edge-bundle DSSE predicate | Planned | `stella.ops/edgeBundle@v1` |
| CAS layout | Done | Per section 8.2 |
| CLI verify command | Planned | Per section 8.3 |
| Golden fixtures | Planned | Per section 8.4 |
| Rekor integration | Done | Via Attestor module |