Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
302 lines
7.5 KiB
Markdown
302 lines
7.5 KiB
Markdown
# CONTRACT-BUILDID-PROPAGATION-401: Build-ID and Code-ID Propagation
|
|
|
|
> **Status:** Published
|
|
> **Version:** 1.0.0
|
|
> **Published:** 2025-12-13
|
|
> **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild
|
|
> **Unblocks:** SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and downstream tasks
|
|
|
|
## Overview
|
|
|
|
This contract defines how GNU build-id (ELF), PE GUID, and Mach-O UUID propagate through the reachability pipeline from Scanner to SBOM, Signals, and runtime facts. It ensures consistent identification of binaries across components for deterministic symbol resolution and replay.
|
|
|
|
---
|
|
|
|
## 1. Build-ID Sources and Formats
|
|
|
|
### 1.1 Per-Format Extraction
|
|
|
|
| Binary Format | Build-ID Source | Prefix | Example |
|
|
|---------------|-----------------|--------|---------|
|
|
| ELF | `.note.gnu.build-id` | `gnu-build-id:` | `gnu-build-id:5f0c7c3cab2eb9bc...` |
|
|
| PE (Windows) | Debug GUID from PE header | `pe-guid:` | `pe-guid:12345678-1234-1234-1234-123456789abc` |
|
|
| Mach-O | `LC_UUID` load command | `macho-uuid:` | `macho-uuid:12345678123412341234123456789abc` |
|
|
|
|
### 1.2 Canonical Format
|
|
|
|
```
|
|
build_id = "{prefix}{hex_lowercase}"
|
|
```
|
|
|
|
- Hex encoding: lowercase, no separators (except PE GUID retains dashes)
|
|
- Minimum length: 16 bytes (32 hex chars) for ELF/Mach-O
|
|
- PE GUID: Standard GUID format with dashes
|
|
|
|
### 1.3 Fallback When Build-ID Absent
|
|
|
|
When build-id is not present (stripped binaries, older toolchains):
|
|
|
|
```json
|
|
{
|
|
"build_id": null,
|
|
"build_id_fallback": {
|
|
"method": "file_hash",
|
|
"value": "sha256:...",
|
|
"confidence": 0.7
|
|
}
|
|
}
|
|
```
|
|
|
|
**Fallback chain:**
|
|
1. `file_hash` - SHA-256 of entire binary file (confidence: 0.7)
|
|
2. `code_section_hash` - SHA-256 of .text section (confidence: 0.6)
|
|
3. `path_hash` - SHA-256 of file path (confidence: 0.3, last resort)
|
|
|
|
---
|
|
|
|
## 2. Code-ID for Name-less Symbols
|
|
|
|
### 2.1 Purpose
|
|
|
|
`code_id` provides stable identification for symbols in stripped binaries where the symbol name is unavailable.
|
|
|
|
### 2.2 Format
|
|
|
|
```
|
|
code_id = "code:{lang}:{base64url_sha256}"
|
|
```
|
|
|
|
**Canonical tuple for binary symbols:**
|
|
```
|
|
{format}\0{build_id_or_file_hash}\0{section}\0{addr}\0{size}\0{code_block_hash}
|
|
```
|
|
|
|
### 2.3 Code Block Hash
|
|
|
|
For stripped functions, compute hash of the code bytes:
|
|
|
|
```
|
|
code_block_hash = "sha256:" + hex(SHA256(code_bytes[addr:addr+size]))
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Cross-RID (Runtime Identifier) Mapping
|
|
|
|
### 3.1 Problem Statement
|
|
|
|
Different platform builds (linux-x64, win-x64, osx-arm64) of the same source code produce different binaries with different build-ids. Runtime facts from one platform must map to the correct binary variant.
|
|
|
|
### 3.2 Variant Group
|
|
|
|
Binaries from the same source are grouped by source digest:
|
|
|
|
```json
|
|
{
|
|
"variant_group": {
|
|
"source_digest": "sha256:...",
|
|
"variants": [
|
|
{
|
|
"rid": "linux-x64",
|
|
"build_id": "gnu-build-id:aaa...",
|
|
"file_hash": "sha256:..."
|
|
},
|
|
{
|
|
"rid": "win-x64",
|
|
"build_id": "pe-guid:bbb...",
|
|
"file_hash": "sha256:..."
|
|
},
|
|
{
|
|
"rid": "osx-arm64",
|
|
"build_id": "macho-uuid:ccc...",
|
|
"file_hash": "sha256:..."
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3.3 Runtime Fact Correlation
|
|
|
|
When Signals ingests runtime facts:
|
|
|
|
1. Extract `build_id` from runtime event
|
|
2. Look up variant group containing this build_id
|
|
3. Correlate with richgraph nodes having matching `build_id`
|
|
4. If no match, fall back to `code_id` + `code_block_hash` matching
|
|
|
|
---
|
|
|
|
## 4. SBOM Integration
|
|
|
|
### 4.1 CycloneDX 1.6 Properties
|
|
|
|
Build-ID propagates to SBOM via component properties:
|
|
|
|
```json
|
|
{
|
|
"type": "library",
|
|
"name": "libssl.so.3",
|
|
"version": "3.0.11",
|
|
"properties": [
|
|
{"name": "stellaops:build-id", "value": "gnu-build-id:5f0c7c3c..."},
|
|
{"name": "stellaops:code-id", "value": "code:binary:abc123..."},
|
|
{"name": "stellaops:file-hash", "value": "sha256:..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 4.2 SPDX 3.0 Integration
|
|
|
|
Build-ID maps to SPDX external references:
|
|
|
|
```json
|
|
{
|
|
"spdxId": "SPDXRef-libssl",
|
|
"externalRef": {
|
|
"referenceCategory": "PERSISTENT-ID",
|
|
"referenceType": "gnu-build-id",
|
|
"referenceLocator": "gnu-build-id:5f0c7c3c..."
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Signals Runtime Facts Schema
|
|
|
|
### 5.1 Runtime Event with Build-ID
|
|
|
|
```json
|
|
{
|
|
"event_type": "function_hit",
|
|
"timestamp": "2025-12-13T10:00:00Z",
|
|
"binary": {
|
|
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
|
|
"build_id": "gnu-build-id:5f0c7c3c...",
|
|
"file_hash": "sha256:..."
|
|
},
|
|
"symbol": {
|
|
"name": "SSL_read",
|
|
"address": "0x12345678",
|
|
"symbol_id": "sym:binary:..."
|
|
},
|
|
"context": {
|
|
"pid": 12345,
|
|
"container_id": "abc123..."
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5.2 Ingestion Endpoint
|
|
|
|
```
|
|
POST /signals/runtime-facts
|
|
Content-Type: application/x-ndjson
|
|
Content-Encoding: gzip
|
|
|
|
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
|
|
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
|
|
```
|
|
|
|
---
|
|
|
|
## 6. RichGraph Integration
|
|
|
|
### 6.1 Node with Build-ID
|
|
|
|
```json
|
|
{
|
|
"id": "sym:binary:...",
|
|
"symbol_id": "sym:binary:...",
|
|
"lang": "binary",
|
|
"kind": "function",
|
|
"display": "SSL_read",
|
|
"build_id": "gnu-build-id:5f0c7c3c...",
|
|
"code_id": "code:binary:...",
|
|
"code_block_hash": "sha256:...",
|
|
"purl": "pkg:deb/debian/libssl3@3.0.11"
|
|
}
|
|
```
|
|
|
|
### 6.2 CAS Evidence Storage
|
|
|
|
```
|
|
cas://binary/
|
|
by-build-id/{build_id}/ # Index by build-id
|
|
graph.json # Associated graph
|
|
symbols.json # Symbol table
|
|
by-code-id/{code_id}/ # Index by code-id
|
|
block.bin # Code block bytes
|
|
disasm.json # Disassembly
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Implementation Requirements
|
|
|
|
### 7.1 Scanner Changes
|
|
|
|
| Component | Change | Priority |
|
|
|-----------|--------|----------|
|
|
| ELF parser | Extract `.note.gnu.build-id` | P0 |
|
|
| PE parser | Extract Debug GUID | P0 |
|
|
| Mach-O parser | Extract `LC_UUID` | P0 |
|
|
| RichGraphBuilder | Populate `build_id` field on nodes | P0 |
|
|
| SBOM emitters | Add `stellaops:build-id` property | P1 |
|
|
|
|
### 7.2 Signals Changes
|
|
|
|
| Component | Change | Priority |
|
|
|-----------|--------|----------|
|
|
| Runtime facts ingestion | Parse and index `build_id` | P0 |
|
|
| Scoring service | Correlate by `build_id` then `code_id` | P0 |
|
|
| Store repository | Add `build_id` index | P1 |
|
|
|
|
### 7.3 CLI/UI Changes
|
|
|
|
| Component | Change | Priority |
|
|
|-----------|--------|----------|
|
|
| `stella graph explain` | Show build_id in output | P1 |
|
|
| UI symbol drawer | Display build_id with copy button | P1 |
|
|
|
|
---
|
|
|
|
## 8. Validation Rules
|
|
|
|
1. `build_id` must match regex: `^(gnu-build-id|pe-guid|macho-uuid):[a-f0-9-]+$`
|
|
2. `code_id` must match regex: `^code:[a-z]+:[A-Za-z0-9_-]+$`
|
|
3. When `build_id` is null, `build_id_fallback` must be present
|
|
4. `code_block_hash` required when `build_id` is null and symbol is stripped
|
|
5. Variant group `source_digest` must be consistent across all variants
|
|
|
|
---
|
|
|
|
## 9. Test Fixtures
|
|
|
|
Location: `tests/Binary/fixtures/build-id/`
|
|
|
|
| Fixture | Description |
|
|
|---------|-------------|
|
|
| `elf-with-buildid/` | ELF binary with GNU build-id |
|
|
| `elf-stripped/` | ELF stripped, fallback to code-id |
|
|
| `pe-with-guid/` | PE binary with Debug GUID |
|
|
| `macho-with-uuid/` | Mach-O binary with LC_UUID |
|
|
| `variant-group/` | Same source, multiple RIDs |
|
|
|
|
---
|
|
|
|
## 10. Related Contracts
|
|
|
|
- [richgraph-v1](./richgraph-v1.md) - Graph schema with build_id field
|
|
- [Binary Reachability](../reachability/binary-reachability-schema.md) - Binary evidence schema
|
|
- [Symbol Manifest](../specs/SYMBOL_MANIFEST_v1.md) - Symbol identification
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Version | Date | Author | Changes |
|
|
|---------|------|--------|---------|
|
|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial contract for build-id propagation |
|