Files
git.stella-ops.org/docs/reachability/binary-reachability-schema.md
StellaOps Bot e2e404e705
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
console-runner-image / build-runner-image (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
up
2025-12-14 16:24:16 +02:00

11 KiB

Binary Reachability Schema

Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild.

This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.


1. Overview

Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:

  • Stripped binaries: Symbol recovery using code_id + code_block_hash
  • Build variants: Handling multiple builds from same source
  • Large graphs: Chunking and size limits for DSSE/Rekor
  • Offline verification: Air-gapped attestation workflows

2. Gap Resolutions

BR1: Canonical DSSE/Predicate Schemas

Binary graph predicate:

stella.ops/binaryGraph@v1

Predicate schema:

{
  "_type": "https://stellaops.dev/predicates/binaryGraph/v1",
  "subject": [
    {
      "name": "graph",
      "digest": {"blake3": "a1b2c3d4e5f6..."}
    }
  ],
  "predicate": {
    "analyzer": {
      "name": "scanner.native",
      "version": "1.2.0",
      "toolchain": "ghidra-11.2"
    },
    "binary": {
      "format": "ELF",
      "arch": "x86_64",
      "file_hash": "sha256:...",
      "build_id": "gnu-build-id:5f0c7c3c..."
    },
    "graph_stats": {
      "node_count": 1247,
      "edge_count": 3891,
      "root_count": 5
    },
    "evidence": {
      "symbols_source": "DWARF",
      "stripped_symbols": 58,
      "heuristic_symbols": 12
    },
    "created_at": "2025-12-13T10:00:00Z"
  }
}

Edge bundle predicate:

stella.ops/binaryEdgeBundle@v1
{
  "_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
  "subject": [
    {
      "name": "edges",
      "digest": {"sha256": "..."}
    }
  ],
  "predicate": {
    "graph_hash": "blake3:a1b2c3d4...",
    "bundle_id": "bundle:001",
    "bundle_reason": "init_array",
    "edge_count": 128,
    "edges": [
      {
        "from": "sym:binary:...",
        "to": "sym:binary:...",
        "reason": "init-array",
        "confidence": 0.95
      }
    ]
  }
}

BR2: Edge Hash Recipe

Binary edge hash computation:

edge_id = "edge:" + sha256(
  canonical_json({
    "from": edge.from,
    "to": edge.to,
    "kind": edge.kind,
    "reason": edge.reason,
    "binary_hash": binary.file_hash  // Binary context included
  })
)

Hash includes binary context:

Unlike managed code edges, binary edges include binary_hash in the hash computation to distinguish edges from different binaries with identical symbol names.

Canonicalization:

  1. Keys: binary_hash, from, kind, reason, to (alphabetical)
  2. No whitespace, UTF-8 encoding
  3. Lowercase hex for all hashes

BR3: Required Binary Evidence with CAS Refs

Required evidence per node:

Evidence Type Required CAS Storage
File hash Yes N/A (inline)
Build ID Conditional N/A (inline)
Symbol source Yes N/A (inline)
Code block hash For stripped cas://binary/blocks/{sha256}
Disassembly Optional cas://binary/disasm/{sha256}
CFG Optional cas://binary/cfg/{sha256}

Evidence schema:

{
  "binary_evidence": {
    "file_hash": "sha256:...",
    "build_id": "gnu-build-id:5f0c7c3c...",
    "symbol_source": "DWARF",
    "symbol_confidence": 0.95,
    "code_block_hash": "sha256:deadbeef...",
    "code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
    "disassembly_uri": "cas://binary/disasm/sha256:...",
    "cfg_uri": "cas://binary/cfg/sha256:..."
  }
}

CAS layout:

cas://binary/
  blocks/{sha256}/         # Code block bytes
  disasm/{sha256}/         # Disassembly JSON
  cfg/{sha256}/            # Control flow graph
  symbols/{sha256}/        # Symbol table extract

BR4: Build-ID/Variant Rules

Build-ID sources:

Format Build-ID Source Example
ELF .note.gnu.build-id gnu-build-id:5f0c7c3c...
PE Debug GUID pe-guid:12345678-1234-...
Mach-O LC_UUID macho-uuid:12345678...

Fallback when build-ID absent:

{
  "build_id": null,
  "build_id_fallback": {
    "method": "file_hash",
    "value": "sha256:...",
    "confidence": 0.7
  }
}

Variant handling:

Multiple binaries from same source (debug/release, different arch):

{
  "variant_group": "sha256:source_hash...",
  "variants": [
    {"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
    {"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
    {"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
  ]
}

BR5: Policy Hash Governance

Policy version binding:

Binary reachability graphs are bound to a policy version:

{
  "policy_binding": {
    "policy_digest": "sha256:...",
    "policy_version": "P-7:v4",
    "bound_at": "2025-12-13T10:00:00Z",
    "binding_mode": "strict"
  }
}

Binding modes:

Mode Behavior
strict Graph invalid if policy changes
forward Graph valid with newer policy versions
any Graph valid with any policy version

Governance rules:

  1. Production graphs use strict binding
  2. Test graphs may use forward
  3. Policy hash computed from canonical DSL
  4. Binding stored in graph metadata

BR6: Sigstore Bundle/Log Routing

Sigstore integration:

{
  "sigstore": {
    "bundle_type": "hashedrekord",
    "log_index": 12345678,
    "log_id": "rekor.sigstore.dev",
    "inclusion_proof": {
      "log_index": 12345678,
      "root_hash": "sha256:...",
      "tree_size": 98765432,
      "hashes": ["sha256:...", "sha256:..."]
    },
    "signed_entry_timestamp": "base64:..."
  }
}

Log routing:

Evidence Type Log Notes
Graph DSSE Rekor (public) Always
Edge bundle DSSE Rekor (capped) Configurable limit
Code block No log CAS only
CFG/Disasm No log CAS only

Offline mode:

When Rekor unavailable:

{
  "sigstore": {
    "mode": "offline",
    "checkpoint": {
      "origin": "rekor.sigstore.dev",
      "checkpoint_data": "base64:...",
      "captured_at": "2025-12-13T10:00:00Z"
    },
    "deferred_submission": true
  }
}

BR7: Idempotent Submission Keys

Submission key format:

submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}

Idempotency rules:

  1. Same key returns existing entry (no duplicate)
  2. Key includes hour-granularity timestamp for rate limiting
  3. Different graphs from same binary produce different keys
  4. Retry within 1 hour uses same key

Implementation:

{
  "submission": {
    "key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
    "status": "accepted",
    "existing_entry": false,
    "log_index": 12345678
  }
}

BR8: Size/Chunking Limits

Size limits:

Element Limit Action on Exceed
Graph JSON 10 MB Chunk nodes/edges
Edge bundle 512 edges Split bundles
DSSE payload 1 MB Compress/chunk
Rekor entry 100 KB Reference CAS

Chunking strategy:

For large graphs (>10MB):

{
  "chunked_graph": {
    "chunk_count": 5,
    "chunks": [
      {"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
      {"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
    ],
    "assembly_order": ["chunk:001", "chunk:002", ...],
    "assembled_hash": "blake3:..."
  }
}

Compression:

  • Graph JSON: gzip before DSSE
  • CAS storage: Raw JSON (indexed)
  • Rekor payload: DSSE references CAS

BR9: API/CLI/UI Surfacing

API endpoints:

Method Path Description
POST /api/binary/graphs Submit binary graph
GET /api/binary/graphs/{hash} Get graph details
GET /api/binary/graphs/{hash}/edges List edges
GET /api/binary/symbols/{symbolId} Get symbol details
POST /api/binary/verify Verify graph attestation

CLI commands:

# Submit binary graph
stella binary submit --graph ./richgraph.json --binary ./app

# Get graph info
stella binary info --hash blake3:a1b2c3d4...

# List symbols
stella binary symbols --hash blake3:... --stripped-only

# Verify attestation
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse

UI components:

  • Binary graph visualization with zoom/pan
  • Symbol table with search/filter
  • Edge explorer with confidence highlighting
  • Attestation status badges
  • Build variant selector

BR10: Binary Fixtures

Fixture location:

tests/Binary/
  fixtures/
    elf-x86_64-with-debug/
      binary.elf
      graph.json
      expected-hashes.txt
    elf-stripped/
      binary.elf
      graph.json
      expected-hashes.txt
    pe-x64-with-pdb/
      binary.exe
      graph.json
      expected-hashes.txt
  golden/
    elf-x86_64.golden.json
    pe-x64.golden.json

datasets/binary/
  schema/
    binary-graph.schema.json
    binary-edge.schema.json
  samples/
    openssl-1.1.1/
      libssl.so
      graph.json
      edges.ndjson

Fixture requirements:

  1. Each binary format has at least one fixture
  2. Stripped and debug variants for each format
  3. Expected hashes verified by CI
  4. Golden outputs include DSSE envelopes
  5. Fixtures reproducible from source (where legal)

Test categories:

  1. Hash stability: Same binary produces same graph hash
  2. Build-ID extraction: Correct build-ID parsing per format
  3. Symbol recovery: DWARF/PDB parsing accuracy
  4. Stripped handling: Code block hash computation
  5. Chunking: Large graph assembly/disassembly
  6. DSSE signing: Envelope creation and verification
  7. Rekor integration: Submission and verification

3. Implementation Status

Component Location Status
ELF parser src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native Implemented
PE parser src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native Implemented
DSSE predicates src/Signer/StellaOps.Signer/PredicateTypes.cs Implemented
CAS storage src/Scanner/__Libraries/StellaOps.Scanner.Reachability Partial
Rekor integration src/Attestor/StellaOps.Attestor Implemented
CLI commands src/Cli/StellaOps.Cli Planned
UI components src/Web/StellaOps.Web Implemented


Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history.