7.5 KiB
CONTRACT-BUILDID-PROPAGATION-401: Build-ID and Code-ID Propagation
Status: Published Version: 1.0.0 Published: 2025-12-13 Owners: Scanner Guild, Signals Guild, BE-Base Platform Guild Unblocks: SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and downstream tasks
Overview
This contract defines how GNU build-id (ELF), PE GUID, and Mach-O UUID propagate through the reachability pipeline from Scanner to SBOM, Signals, and runtime facts. It ensures consistent identification of binaries across components for deterministic symbol resolution and replay.
1. Build-ID Sources and Formats
1.1 Per-Format Extraction
| Binary Format | Build-ID Source | Prefix | Example |
|---|---|---|---|
| ELF | .note.gnu.build-id |
gnu-build-id: |
gnu-build-id:5f0c7c3cab2eb9bc... |
| PE (Windows) | Debug GUID from PE header | pe-guid: |
pe-guid:12345678-1234-1234-1234-123456789abc |
| Mach-O | LC_UUID load command |
macho-uuid: |
macho-uuid:12345678123412341234123456789abc |
1.2 Canonical Format
build_id = "{prefix}{hex_lowercase}"
- Hex encoding: lowercase, no separators (except PE GUID retains dashes)
- Minimum length: 16 bytes (32 hex chars) for ELF/Mach-O
- PE GUID: Standard GUID format with dashes
1.3 Fallback When Build-ID Absent
When build-id is not present (stripped binaries, older toolchains):
{
"build_id": null,
"build_id_fallback": {
"method": "file_hash",
"value": "sha256:...",
"confidence": 0.7
}
}
Fallback chain:
file_hash- SHA-256 of entire binary file (confidence: 0.7)code_section_hash- SHA-256 of .text section (confidence: 0.6)path_hash- SHA-256 of file path (confidence: 0.3, last resort)
2. Code-ID for Name-less Symbols
2.1 Purpose
code_id provides stable identification for symbols in stripped binaries where the symbol name is unavailable.
2.2 Format
code_id = "code:{lang}:{base64url_sha256}"
Canonical tuple for binary symbols:
{format}\0{build_id_or_file_hash}\0{section}\0{addr}\0{size}\0{code_block_hash}
2.3 Code Block Hash
For stripped functions, compute hash of the code bytes:
code_block_hash = "sha256:" + hex(SHA256(code_bytes[addr:addr+size]))
3. Cross-RID (Runtime Identifier) Mapping
3.1 Problem Statement
Different platform builds (linux-x64, win-x64, osx-arm64) of the same source code produce different binaries with different build-ids. Runtime facts from one platform must map to the correct binary variant.
3.2 Variant Group
Binaries from the same source are grouped by source digest:
{
"variant_group": {
"source_digest": "sha256:...",
"variants": [
{
"rid": "linux-x64",
"build_id": "gnu-build-id:aaa...",
"file_hash": "sha256:..."
},
{
"rid": "win-x64",
"build_id": "pe-guid:bbb...",
"file_hash": "sha256:..."
},
{
"rid": "osx-arm64",
"build_id": "macho-uuid:ccc...",
"file_hash": "sha256:..."
}
]
}
}
3.3 Runtime Fact Correlation
When Signals ingests runtime facts:
- Extract
build_idfrom runtime event - Look up variant group containing this build_id
- Correlate with richgraph nodes having matching
build_id - If no match, fall back to
code_id+code_block_hashmatching
4. SBOM Integration
4.1 CycloneDX 1.6 Properties
Build-ID propagates to SBOM via component properties:
{
"type": "library",
"name": "libssl.so.3",
"version": "3.0.11",
"properties": [
{"name": "stellaops:build-id", "value": "gnu-build-id:5f0c7c3c..."},
{"name": "stellaops:code-id", "value": "code:binary:abc123..."},
{"name": "stellaops:file-hash", "value": "sha256:..."}
]
}
4.2 SPDX 3.0 Integration
Build-ID maps to SPDX external references:
{
"spdxId": "SPDXRef-libssl",
"externalRef": {
"referenceCategory": "PERSISTENT-ID",
"referenceType": "gnu-build-id",
"referenceLocator": "gnu-build-id:5f0c7c3c..."
}
}
5. Signals Runtime Facts Schema
5.1 Runtime Event with Build-ID
{
"event_type": "function_hit",
"timestamp": "2025-12-13T10:00:00Z",
"binary": {
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
"build_id": "gnu-build-id:5f0c7c3c...",
"file_hash": "sha256:..."
},
"symbol": {
"name": "SSL_read",
"address": "0x12345678",
"symbol_id": "sym:binary:..."
},
"context": {
"pid": 12345,
"container_id": "abc123..."
}
}
5.2 Ingestion Endpoint
POST /signals/runtime-facts
Content-Type: application/x-ndjson
Content-Encoding: gzip
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
6. RichGraph Integration
6.1 Node with Build-ID
{
"id": "sym:binary:...",
"symbol_id": "sym:binary:...",
"lang": "binary",
"kind": "function",
"display": "SSL_read",
"build_id": "gnu-build-id:5f0c7c3c...",
"code_id": "code:binary:...",
"code_block_hash": "sha256:...",
"purl": "pkg:deb/debian/libssl3@3.0.11"
}
6.2 CAS Evidence Storage
cas://binary/
by-build-id/{build_id}/ # Index by build-id
graph.json # Associated graph
symbols.json # Symbol table
by-code-id/{code_id}/ # Index by code-id
block.bin # Code block bytes
disasm.json # Disassembly
7. Implementation Requirements
7.1 Scanner Changes
| Component | Change | Priority |
|---|---|---|
| ELF parser | Extract .note.gnu.build-id |
P0 |
| PE parser | Extract Debug GUID | P0 |
| Mach-O parser | Extract LC_UUID |
P0 |
| RichGraphBuilder | Populate build_id field on nodes |
P0 |
| SBOM emitters | Add stellaops:build-id property |
P1 |
7.2 Signals Changes
| Component | Change | Priority |
|---|---|---|
| Runtime facts ingestion | Parse and index build_id |
P0 |
| Scoring service | Correlate by build_id then code_id |
P0 |
| Store repository | Add build_id index |
P1 |
7.3 CLI/UI Changes
| Component | Change | Priority |
|---|---|---|
stella graph explain |
Show build_id in output | P1 |
| UI symbol drawer | Display build_id with copy button | P1 |
8. Validation Rules
build_idmust match regex:^(gnu-build-id|pe-guid|macho-uuid):[a-f0-9-]+$code_idmust match regex:^code:[a-z]+:[A-Za-z0-9_-]+$- When
build_idis null,build_id_fallbackmust be present code_block_hashrequired whenbuild_idis null and symbol is stripped- Variant group
source_digestmust be consistent across all variants
9. Test Fixtures
Location: tests/Binary/fixtures/build-id/
| Fixture | Description |
|---|---|
elf-with-buildid/ |
ELF binary with GNU build-id |
elf-stripped/ |
ELF stripped, fallback to code-id |
pe-with-guid/ |
PE binary with Debug GUID |
macho-with-uuid/ |
Mach-O binary with LC_UUID |
variant-group/ |
Same source, multiple RIDs |
10. Related Contracts
- richgraph-v1 - Graph schema with build_id field
- Binary Reachability - Binary evidence schema
- Symbol Manifest - Symbol identification
Changelog
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial contract for build-id propagation |