Files
git.stella-ops.org/docs/reachability/binary-reachability-schema.md
StellaOps Bot 6e45066e37
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
up
2025-12-13 09:37:15 +02:00

462 lines
11 KiB
Markdown

# Binary Reachability Schema
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
---
## 1. Overview
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
- **Build variants:** Handling multiple builds from same source
- **Large graphs:** Chunking and size limits for DSSE/Rekor
- **Offline verification:** Air-gapped attestation workflows
---
## 2. Gap Resolutions
### BR1: Canonical DSSE/Predicate Schemas
**Binary graph predicate:**
```
stella.ops/binaryGraph@v1
```
**Predicate schema:**
```json
{
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
"subject": [
{
"name": "graph",
"digest": {"blake3": "a1b2c3d4e5f6..."}
}
],
"predicate": {
"analyzer": {
"name": "scanner.native",
"version": "1.2.0",
"toolchain": "ghidra-11.2"
},
"binary": {
"format": "ELF",
"arch": "x86_64",
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c..."
},
"graph_stats": {
"node_count": 1247,
"edge_count": 3891,
"root_count": 5
},
"evidence": {
"symbols_source": "DWARF",
"stripped_symbols": 58,
"heuristic_symbols": 12
},
"created_at": "2025-12-13T10:00:00Z"
}
}
```
**Edge bundle predicate:**
```
stella.ops/binaryEdgeBundle@v1
```
```json
{
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
"subject": [
{
"name": "edges",
"digest": {"sha256": "..."}
}
],
"predicate": {
"graph_hash": "blake3:a1b2c3d4...",
"bundle_id": "bundle:001",
"bundle_reason": "init_array",
"edge_count": 128,
"edges": [
{
"from": "sym:binary:...",
"to": "sym:binary:...",
"reason": "init-array",
"confidence": 0.95
}
]
}
}
```
### BR2: Edge Hash Recipe
**Binary edge hash computation:**
```
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason,
"binary_hash": binary.file_hash // Binary context included
})
)
```
**Hash includes binary context:**
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
**Canonicalization:**
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
2. No whitespace, UTF-8 encoding
3. Lowercase hex for all hashes
### BR3: Required Binary Evidence with CAS Refs
**Required evidence per node:**
| Evidence Type | Required | CAS Storage |
|---------------|----------|-------------|
| File hash | Yes | N/A (inline) |
| Build ID | Conditional | N/A (inline) |
| Symbol source | Yes | N/A (inline) |
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
| CFG | Optional | `cas://binary/cfg/{sha256}` |
**Evidence schema:**
```json
{
"binary_evidence": {
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_source": "DWARF",
"symbol_confidence": 0.95,
"code_block_hash": "sha256:deadbeef...",
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
"disassembly_uri": "cas://binary/disasm/sha256:...",
"cfg_uri": "cas://binary/cfg/sha256:..."
}
}
```
**CAS layout:**
```
cas://binary/
blocks/{sha256}/ # Code block bytes
disasm/{sha256}/ # Disassembly JSON
cfg/{sha256}/ # Control flow graph
symbols/{sha256}/ # Symbol table extract
```
### BR4: Build-ID/Variant Rules
**Build-ID sources:**
| Format | Build-ID Source | Example |
|--------|-----------------|---------|
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
**Fallback when build-ID absent:**
```json
{
"build_id": null,
"build_id_fallback": {
"method": "file_hash",
"value": "sha256:...",
"confidence": 0.7
}
}
```
**Variant handling:**
Multiple binaries from same source (debug/release, different arch):
```json
{
"variant_group": "sha256:source_hash...",
"variants": [
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
]
}
```
### BR5: Policy Hash Governance
**Policy version binding:**
Binary reachability graphs are bound to a policy version:
```json
{
"policy_binding": {
"policy_digest": "sha256:...",
"policy_version": "P-7:v4",
"bound_at": "2025-12-13T10:00:00Z",
"binding_mode": "strict"
}
}
```
**Binding modes:**
| Mode | Behavior |
|------|----------|
| `strict` | Graph invalid if policy changes |
| `forward` | Graph valid with newer policy versions |
| `any` | Graph valid with any policy version |
**Governance rules:**
1. Production graphs use `strict` binding
2. Test graphs may use `forward`
3. Policy hash computed from canonical DSL
4. Binding stored in graph metadata
### BR6: Sigstore Bundle/Log Routing
**Sigstore integration:**
```json
{
"sigstore": {
"bundle_type": "hashedrekord",
"log_index": 12345678,
"log_id": "rekor.sigstore.dev",
"inclusion_proof": {
"log_index": 12345678,
"root_hash": "sha256:...",
"tree_size": 98765432,
"hashes": ["sha256:...", "sha256:..."]
},
"signed_entry_timestamp": "base64:..."
}
}
```
**Log routing:**
| Evidence Type | Log | Notes |
|---------------|-----|-------|
| Graph DSSE | Rekor (public) | Always |
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
| Code block | No log | CAS only |
| CFG/Disasm | No log | CAS only |
**Offline mode:**
When Rekor unavailable:
```json
{
"sigstore": {
"mode": "offline",
"checkpoint": {
"origin": "rekor.sigstore.dev",
"checkpoint_data": "base64:...",
"captured_at": "2025-12-13T10:00:00Z"
},
"deferred_submission": true
}
}
```
### BR7: Idempotent Submission Keys
**Submission key format:**
```
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
```
**Idempotency rules:**
1. Same key returns existing entry (no duplicate)
2. Key includes hour-granularity timestamp for rate limiting
3. Different graphs from same binary produce different keys
4. Retry within 1 hour uses same key
**Implementation:**
```json
{
"submission": {
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
"status": "accepted",
"existing_entry": false,
"log_index": 12345678
}
}
```
### BR8: Size/Chunking Limits
**Size limits:**
| Element | Limit | Action on Exceed |
|---------|-------|------------------|
| Graph JSON | 10 MB | Chunk nodes/edges |
| Edge bundle | 512 edges | Split bundles |
| DSSE payload | 1 MB | Compress/chunk |
| Rekor entry | 100 KB | Reference CAS |
**Chunking strategy:**
For large graphs (>10MB):
```json
{
"chunked_graph": {
"chunk_count": 5,
"chunks": [
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
],
"assembly_order": ["chunk:001", "chunk:002", ...],
"assembled_hash": "blake3:..."
}
}
```
**Compression:**
- Graph JSON: gzip before DSSE
- CAS storage: Raw JSON (indexed)
- Rekor payload: DSSE references CAS
### BR9: API/CLI/UI Surfacing
**API endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/binary/graphs` | Submit binary graph |
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
| `POST` | `/api/binary/verify` | Verify graph attestation |
**CLI commands:**
```bash
# Submit binary graph
stella binary submit --graph ./richgraph.json --binary ./app
# Get graph info
stella binary info --hash blake3:a1b2c3d4...
# List symbols
stella binary symbols --hash blake3:... --stripped-only
# Verify attestation
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
```
**UI components:**
- Binary graph visualization with zoom/pan
- Symbol table with search/filter
- Edge explorer with confidence highlighting
- Attestation status badges
- Build variant selector
### BR10: Binary Fixtures
**Fixture location:**
```
tests/Binary/
fixtures/
elf-x86_64-with-debug/
binary.elf
graph.json
expected-hashes.txt
elf-stripped/
binary.elf
graph.json
expected-hashes.txt
pe-x64-with-pdb/
binary.exe
graph.json
expected-hashes.txt
golden/
elf-x86_64.golden.json
pe-x64.golden.json
datasets/binary/
schema/
binary-graph.schema.json
binary-edge.schema.json
samples/
openssl-1.1.1/
libssl.so
graph.json
edges.ndjson
```
**Fixture requirements:**
1. Each binary format has at least one fixture
2. Stripped and debug variants for each format
3. Expected hashes verified by CI
4. Golden outputs include DSSE envelopes
5. Fixtures reproducible from source (where legal)
**Test categories:**
1. **Hash stability:** Same binary produces same graph hash
2. **Build-ID extraction:** Correct build-ID parsing per format
3. **Symbol recovery:** DWARF/PDB parsing accuracy
4. **Stripped handling:** Code block hash computation
5. **Chunking:** Large graph assembly/disassembly
6. **DSSE signing:** Envelope creation and verification
7. **Rekor integration:** Submission and verification
---
## 3. Implementation Status
| Component | Location | Status |
|-----------|----------|--------|
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
| UI components | `src/UI/StellaOps.UI` | Planned |
---
## 4. Related Documentation
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
---
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._