# CONTRACT-BUILDID-PROPAGATION-401: Build-ID and Code-ID Propagation > **Status:** Published > **Version:** 1.0.0 > **Published:** 2025-12-13 > **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild > **Unblocks:** SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and downstream tasks ## Overview This contract defines how GNU build-id (ELF), PE GUID, and Mach-O UUID propagate through the reachability pipeline from Scanner to SBOM, Signals, and runtime facts. It ensures consistent identification of binaries across components for deterministic symbol resolution and replay. --- ## 1. Build-ID Sources and Formats ### 1.1 Per-Format Extraction | Binary Format | Build-ID Source | Prefix | Example | |---------------|-----------------|--------|---------| | ELF | `.note.gnu.build-id` | `gnu-build-id:` | `gnu-build-id:5f0c7c3cab2eb9bc...` | | PE (Windows) | Debug GUID from PE header | `pe-guid:` | `pe-guid:12345678-1234-1234-1234-123456789abc` | | Mach-O | `LC_UUID` load command | `macho-uuid:` | `macho-uuid:12345678123412341234123456789abc` | ### 1.2 Canonical Format ``` build_id = "{prefix}{hex_lowercase}" ``` - Hex encoding: lowercase, no separators (except PE GUID retains dashes) - Minimum length: 16 bytes (32 hex chars) for ELF/Mach-O - PE GUID: Standard GUID format with dashes ### 1.3 Fallback When Build-ID Absent When build-id is not present (stripped binaries, older toolchains): ```json { "build_id": null, "build_id_fallback": { "method": "file_hash", "value": "sha256:...", "confidence": 0.7 } } ``` **Fallback chain:** 1. `file_hash` - SHA-256 of entire binary file (confidence: 0.7) 2. `code_section_hash` - SHA-256 of .text section (confidence: 0.6) 3. `path_hash` - SHA-256 of file path (confidence: 0.3, last resort) --- ## 2. Code-ID for Name-less Symbols ### 2.1 Purpose `code_id` provides stable identification for symbols in stripped binaries where the symbol name is unavailable. ### 2.2 Format ``` code_id = "code:{lang}:{base64url_sha256}" ``` **Canonical tuple for binary symbols:** ``` {format}\0{build_id_or_file_hash}\0{section}\0{addr}\0{size}\0{code_block_hash} ``` ### 2.3 Code Block Hash For stripped functions, compute hash of the code bytes: ``` code_block_hash = "sha256:" + hex(SHA256(code_bytes[addr:addr+size])) ``` --- ## 3. Cross-RID (Runtime Identifier) Mapping ### 3.1 Problem Statement Different platform builds (linux-x64, win-x64, osx-arm64) of the same source code produce different binaries with different build-ids. Runtime facts from one platform must map to the correct binary variant. ### 3.2 Variant Group Binaries from the same source are grouped by source digest: ```json { "variant_group": { "source_digest": "sha256:...", "variants": [ { "rid": "linux-x64", "build_id": "gnu-build-id:aaa...", "file_hash": "sha256:..." }, { "rid": "win-x64", "build_id": "pe-guid:bbb...", "file_hash": "sha256:..." }, { "rid": "osx-arm64", "build_id": "macho-uuid:ccc...", "file_hash": "sha256:..." } ] } } ``` ### 3.3 Runtime Fact Correlation When Signals ingests runtime facts: 1. Extract `build_id` from runtime event 2. Look up variant group containing this build_id 3. Correlate with richgraph nodes having matching `build_id` 4. If no match, fall back to `code_id` + `code_block_hash` matching --- ## 4. SBOM Integration ### 4.1 CycloneDX 1.6 Properties Build-ID propagates to SBOM via component properties: ```json { "type": "library", "name": "libssl.so.3", "version": "3.0.11", "properties": [ {"name": "stellaops:build-id", "value": "gnu-build-id:5f0c7c3c..."}, {"name": "stellaops:code-id", "value": "code:binary:abc123..."}, {"name": "stellaops:file-hash", "value": "sha256:..."} ] } ``` ### 4.2 SPDX 3.0 Integration Build-ID maps to SPDX external references: ```json { "spdxId": "SPDXRef-libssl", "externalRef": { "referenceCategory": "PERSISTENT-ID", "referenceType": "gnu-build-id", "referenceLocator": "gnu-build-id:5f0c7c3c..." } } ``` --- ## 5. Signals Runtime Facts Schema ### 5.1 Runtime Event with Build-ID ```json { "event_type": "function_hit", "timestamp": "2025-12-13T10:00:00Z", "binary": { "path": "/usr/lib/x86_64-linux-gnu/libssl.so.3", "build_id": "gnu-build-id:5f0c7c3c...", "file_hash": "sha256:..." }, "symbol": { "name": "SSL_read", "address": "0x12345678", "symbol_id": "sym:binary:..." }, "context": { "pid": 12345, "container_id": "abc123..." } } ``` ### 5.2 Ingestion Endpoint ``` POST /signals/runtime-facts Content-Type: application/x-ndjson Content-Encoding: gzip {"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...} {"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...} ``` --- ## 6. RichGraph Integration ### 6.1 Node with Build-ID ```json { "id": "sym:binary:...", "symbol_id": "sym:binary:...", "lang": "binary", "kind": "function", "display": "SSL_read", "build_id": "gnu-build-id:5f0c7c3c...", "code_id": "code:binary:...", "code_block_hash": "sha256:...", "purl": "pkg:deb/debian/libssl3@3.0.11" } ``` ### 6.2 CAS Evidence Storage ``` cas://binary/ by-build-id/{build_id}/ # Index by build-id graph.json # Associated graph symbols.json # Symbol table by-code-id/{code_id}/ # Index by code-id block.bin # Code block bytes disasm.json # Disassembly ``` --- ## 7. Implementation Requirements ### 7.1 Scanner Changes | Component | Change | Priority | |-----------|--------|----------| | ELF parser | Extract `.note.gnu.build-id` | P0 | | PE parser | Extract Debug GUID | P0 | | Mach-O parser | Extract `LC_UUID` | P0 | | RichGraphBuilder | Populate `build_id` field on nodes | P0 | | SBOM emitters | Add `stellaops:build-id` property | P1 | ### 7.2 Signals Changes | Component | Change | Priority | |-----------|--------|----------| | Runtime facts ingestion | Parse and index `build_id` | P0 | | Scoring service | Correlate by `build_id` then `code_id` | P0 | | Store repository | Add `build_id` index | P1 | ### 7.3 CLI/UI Changes | Component | Change | Priority | |-----------|--------|----------| | `stella graph explain` | Show build_id in output | P1 | | UI symbol drawer | Display build_id with copy button | P1 | --- ## 8. Validation Rules 1. `build_id` must match regex: `^(gnu-build-id|pe-guid|macho-uuid):[a-f0-9-]+$` 2. `code_id` must match regex: `^code:[a-z]+:[A-Za-z0-9_-]+$` 3. When `build_id` is null, `build_id_fallback` must be present 4. `code_block_hash` required when `build_id` is null and symbol is stripped 5. Variant group `source_digest` must be consistent across all variants --- ## 9. Test Fixtures Location: `tests/Binary/fixtures/build-id/` | Fixture | Description | |---------|-------------| | `elf-with-buildid/` | ELF binary with GNU build-id | | `elf-stripped/` | ELF stripped, fallback to code-id | | `pe-with-guid/` | PE binary with Debug GUID | | `macho-with-uuid/` | Mach-O binary with LC_UUID | | `variant-group/` | Same source, multiple RIDs | --- ## 10. Related Contracts - [richgraph-v1](./richgraph-v1.md) - Graph schema with build_id field - [Binary Reachability](../reachability/binary-reachability-schema.md) - Binary evidence schema - [Symbol Manifest](../specs/SYMBOL_MANIFEST_v1.md) - Symbol identification --- ## Changelog | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2025-12-13 | Scanner Guild | Initial contract for build-id propagation |