11 KiB
Binary Reachability Schema
Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild.
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
1. Overview
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
- Stripped binaries: Symbol recovery using
code_id+code_block_hash - Build variants: Handling multiple builds from same source
- Large graphs: Chunking and size limits for DSSE/Rekor
- Offline verification: Air-gapped attestation workflows
2. Gap Resolutions
BR1: Canonical DSSE/Predicate Schemas
Binary graph predicate:
stella.ops/binaryGraph@v1
Predicate schema:
{
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
"subject": [
{
"name": "graph",
"digest": {"blake3": "a1b2c3d4e5f6..."}
}
],
"predicate": {
"analyzer": {
"name": "scanner.native",
"version": "1.2.0",
"toolchain": "ghidra-11.2"
},
"binary": {
"format": "ELF",
"arch": "x86_64",
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c..."
},
"graph_stats": {
"node_count": 1247,
"edge_count": 3891,
"root_count": 5
},
"evidence": {
"symbols_source": "DWARF",
"stripped_symbols": 58,
"heuristic_symbols": 12
},
"created_at": "2025-12-13T10:00:00Z"
}
}
Edge bundle predicate:
stella.ops/binaryEdgeBundle@v1
{
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
"subject": [
{
"name": "edges",
"digest": {"sha256": "..."}
}
],
"predicate": {
"graph_hash": "blake3:a1b2c3d4...",
"bundle_id": "bundle:001",
"bundle_reason": "init_array",
"edge_count": 128,
"edges": [
{
"from": "sym:binary:...",
"to": "sym:binary:...",
"reason": "init-array",
"confidence": 0.95
}
]
}
}
BR2: Edge Hash Recipe
Binary edge hash computation:
edge_id = "edge:" + sha256(
canonical_json({
"from": edge.from,
"to": edge.to,
"kind": edge.kind,
"reason": edge.reason,
"binary_hash": binary.file_hash // Binary context included
})
)
Hash includes binary context:
Unlike managed code edges, binary edges include binary_hash in the hash computation to distinguish edges from different binaries with identical symbol names.
Canonicalization:
- Keys:
binary_hash,from,kind,reason,to(alphabetical) - No whitespace, UTF-8 encoding
- Lowercase hex for all hashes
BR3: Required Binary Evidence with CAS Refs
Required evidence per node:
| Evidence Type | Required | CAS Storage |
|---|---|---|
| File hash | Yes | N/A (inline) |
| Build ID | Conditional | N/A (inline) |
| Symbol source | Yes | N/A (inline) |
| Code block hash | For stripped | cas://binary/blocks/{sha256} |
| Disassembly | Optional | cas://binary/disasm/{sha256} |
| CFG | Optional | cas://binary/cfg/{sha256} |
Evidence schema:
{
"binary_evidence": {
"file_hash": "sha256:...",
"build_id": "gnu-build-id:5f0c7c3c...",
"symbol_source": "DWARF",
"symbol_confidence": 0.95,
"code_block_hash": "sha256:deadbeef...",
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
"disassembly_uri": "cas://binary/disasm/sha256:...",
"cfg_uri": "cas://binary/cfg/sha256:..."
}
}
CAS layout:
cas://binary/
blocks/{sha256}/ # Code block bytes
disasm/{sha256}/ # Disassembly JSON
cfg/{sha256}/ # Control flow graph
symbols/{sha256}/ # Symbol table extract
BR4: Build-ID/Variant Rules
Build-ID sources:
| Format | Build-ID Source | Example |
|---|---|---|
| ELF | .note.gnu.build-id |
gnu-build-id:5f0c7c3c... |
| PE | Debug GUID | pe-guid:12345678-1234-... |
| Mach-O | LC_UUID |
macho-uuid:12345678... |
Fallback when build-ID absent:
{
"build_id": null,
"build_id_fallback": {
"method": "file_hash",
"value": "sha256:...",
"confidence": 0.7
}
}
Variant handling:
Multiple binaries from same source (debug/release, different arch):
{
"variant_group": "sha256:source_hash...",
"variants": [
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
]
}
BR5: Policy Hash Governance
Policy version binding:
Binary reachability graphs are bound to a policy version:
{
"policy_binding": {
"policy_digest": "sha256:...",
"policy_version": "P-7:v4",
"bound_at": "2025-12-13T10:00:00Z",
"binding_mode": "strict"
}
}
Binding modes:
| Mode | Behavior |
|---|---|
strict |
Graph invalid if policy changes |
forward |
Graph valid with newer policy versions |
any |
Graph valid with any policy version |
Governance rules:
- Production graphs use
strictbinding - Test graphs may use
forward - Policy hash computed from canonical DSL
- Binding stored in graph metadata
BR6: Sigstore Bundle/Log Routing
Sigstore integration:
{
"sigstore": {
"bundle_type": "hashedrekord",
"log_index": 12345678,
"log_id": "rekor.sigstore.dev",
"inclusion_proof": {
"log_index": 12345678,
"root_hash": "sha256:...",
"tree_size": 98765432,
"hashes": ["sha256:...", "sha256:..."]
},
"signed_entry_timestamp": "base64:..."
}
}
Log routing:
| Evidence Type | Log | Notes |
|---|---|---|
| Graph DSSE | Rekor (public) | Always |
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
| Code block | No log | CAS only |
| CFG/Disasm | No log | CAS only |
Offline mode:
When Rekor unavailable:
{
"sigstore": {
"mode": "offline",
"checkpoint": {
"origin": "rekor.sigstore.dev",
"checkpoint_data": "base64:...",
"captured_at": "2025-12-13T10:00:00Z"
},
"deferred_submission": true
}
}
BR7: Idempotent Submission Keys
Submission key format:
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
Idempotency rules:
- Same key returns existing entry (no duplicate)
- Key includes hour-granularity timestamp for rate limiting
- Different graphs from same binary produce different keys
- Retry within 1 hour uses same key
Implementation:
{
"submission": {
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
"status": "accepted",
"existing_entry": false,
"log_index": 12345678
}
}
BR8: Size/Chunking Limits
Size limits:
| Element | Limit | Action on Exceed |
|---|---|---|
| Graph JSON | 10 MB | Chunk nodes/edges |
| Edge bundle | 512 edges | Split bundles |
| DSSE payload | 1 MB | Compress/chunk |
| Rekor entry | 100 KB | Reference CAS |
Chunking strategy:
For large graphs (>10MB):
{
"chunked_graph": {
"chunk_count": 5,
"chunks": [
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
],
"assembly_order": ["chunk:001", "chunk:002", ...],
"assembled_hash": "blake3:..."
}
}
Compression:
- Graph JSON: gzip before DSSE
- CAS storage: Raw JSON (indexed)
- Rekor payload: DSSE references CAS
BR9: API/CLI/UI Surfacing
API endpoints:
| Method | Path | Description |
|---|---|---|
POST |
/api/binary/graphs |
Submit binary graph |
GET |
/api/binary/graphs/{hash} |
Get graph details |
GET |
/api/binary/graphs/{hash}/edges |
List edges |
GET |
/api/binary/symbols/{symbolId} |
Get symbol details |
POST |
/api/binary/verify |
Verify graph attestation |
CLI commands:
# Submit binary graph
stella binary submit --graph ./richgraph.json --binary ./app
# Get graph info
stella binary info --hash blake3:a1b2c3d4...
# List symbols
stella binary symbols --hash blake3:... --stripped-only
# Verify attestation
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
UI components:
- Binary graph visualization with zoom/pan
- Symbol table with search/filter
- Edge explorer with confidence highlighting
- Attestation status badges
- Build variant selector
BR10: Binary Fixtures
Fixture location:
tests/Binary/
fixtures/
elf-x86_64-with-debug/
binary.elf
graph.json
expected-hashes.txt
elf-stripped/
binary.elf
graph.json
expected-hashes.txt
pe-x64-with-pdb/
binary.exe
graph.json
expected-hashes.txt
golden/
elf-x86_64.golden.json
pe-x64.golden.json
datasets/binary/
schema/
binary-graph.schema.json
binary-edge.schema.json
samples/
openssl-1.1.1/
libssl.so
graph.json
edges.ndjson
Fixture requirements:
- Each binary format has at least one fixture
- Stripped and debug variants for each format
- Expected hashes verified by CI
- Golden outputs include DSSE envelopes
- Fixtures reproducible from source (where legal)
Test categories:
- Hash stability: Same binary produces same graph hash
- Build-ID extraction: Correct build-ID parsing per format
- Symbol recovery: DWARF/PDB parsing accuracy
- Stripped handling: Code block hash computation
- Chunking: Large graph assembly/disassembly
- DSSE signing: Envelope creation and verification
- Rekor integration: Submission and verification
3. Implementation Status
| Component | Location | Status |
|---|---|---|
| ELF parser | src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native |
Implemented |
| PE parser | src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native |
Implemented |
| DSSE predicates | src/Signer/StellaOps.Signer/PredicateTypes.cs |
Implemented |
| CAS storage | src/Scanner/__Libraries/StellaOps.Scanner.Reachability |
Partial |
| Rekor integration | src/Attestor/StellaOps.Attestor |
Implemented |
| CLI commands | src/Cli/StellaOps.Cli |
Planned |
| UI components | src/Web/StellaOps.Web |
Implemented |
4. Related Documentation
- richgraph-v1 Contract - Graph schema specification
- Function-Level Evidence - Evidence chain guide
- Edge Explainability - Edge reason codes
- Hybrid Attestation - Graph and edge-bundle DSSE
- Native Analyzer Tests - Test fixtures
Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history.