Files
git.stella-ops.org/docs/contracts/buildid-propagation.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

302 lines
7.5 KiB
Markdown

# CONTRACT-BUILDID-PROPAGATION-401: Build-ID and Code-ID Propagation
> **Status:** Published
> **Version:** 1.0.0
> **Published:** 2025-12-13
> **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild
> **Unblocks:** SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and downstream tasks
## Overview
This contract defines how GNU build-id (ELF), PE GUID, and Mach-O UUID propagate through the reachability pipeline from Scanner to SBOM, Signals, and runtime facts. It ensures consistent identification of binaries across components for deterministic symbol resolution and replay.
---
## 1. Build-ID Sources and Formats
### 1.1 Per-Format Extraction
| Binary Format | Build-ID Source | Prefix | Example |
|---------------|-----------------|--------|---------|
| ELF | `.note.gnu.build-id` | `gnu-build-id:` | `gnu-build-id:5f0c7c3cab2eb9bc...` |
| PE (Windows) | Debug GUID from PE header | `pe-guid:` | `pe-guid:12345678-1234-1234-1234-123456789abc` |
| Mach-O | `LC_UUID` load command | `macho-uuid:` | `macho-uuid:12345678123412341234123456789abc` |
### 1.2 Canonical Format
```
build_id = "{prefix}{hex_lowercase}"
```
- Hex encoding: lowercase, no separators (except PE GUID retains dashes)
- Minimum length: 16 bytes (32 hex chars) for ELF/Mach-O
- PE GUID: Standard GUID format with dashes
### 1.3 Fallback When Build-ID Absent
When build-id is not present (stripped binaries, older toolchains):
```json
{
"build_id": null,
"build_id_fallback": {
"method": "file_hash",
"value": "sha256:...",
"confidence": 0.7
}
}
```
**Fallback chain:**
1. `file_hash` - SHA-256 of entire binary file (confidence: 0.7)
2. `code_section_hash` - SHA-256 of .text section (confidence: 0.6)
3. `path_hash` - SHA-256 of file path (confidence: 0.3, last resort)
---
## 2. Code-ID for Name-less Symbols
### 2.1 Purpose
`code_id` provides stable identification for symbols in stripped binaries where the symbol name is unavailable.
### 2.2 Format
```
code_id = "code:{lang}:{base64url_sha256}"
```
**Canonical tuple for binary symbols:**
```
{format}\0{build_id_or_file_hash}\0{section}\0{addr}\0{size}\0{code_block_hash}
```
### 2.3 Code Block Hash
For stripped functions, compute hash of the code bytes:
```
code_block_hash = "sha256:" + hex(SHA256(code_bytes[addr:addr+size]))
```
---
## 3. Cross-RID (Runtime Identifier) Mapping
### 3.1 Problem Statement
Different platform builds (linux-x64, win-x64, osx-arm64) of the same source code produce different binaries with different build-ids. Runtime facts from one platform must map to the correct binary variant.
### 3.2 Variant Group
Binaries from the same source are grouped by source digest:
```json
{
"variant_group": {
"source_digest": "sha256:...",
"variants": [
{
"rid": "linux-x64",
"build_id": "gnu-build-id:aaa...",
"file_hash": "sha256:..."
},
{
"rid": "win-x64",
"build_id": "pe-guid:bbb...",
"file_hash": "sha256:..."
},
{
"rid": "osx-arm64",
"build_id": "macho-uuid:ccc...",
"file_hash": "sha256:..."
}
]
}
}
```
### 3.3 Runtime Fact Correlation
When Signals ingests runtime facts:
1. Extract `build_id` from runtime event
2. Look up variant group containing this build_id
3. Correlate with richgraph nodes having matching `build_id`
4. If no match, fall back to `code_id` + `code_block_hash` matching
---
## 4. SBOM Integration
### 4.1 CycloneDX 1.6 Properties
Build-ID propagates to SBOM via component properties:
```json
{
"type": "library",
"name": "libssl.so.3",
"version": "3.0.11",
"properties": [
{"name": "stellaops:build-id", "value": "gnu-build-id:5f0c7c3c..."},
{"name": "stellaops:code-id", "value": "code:binary:abc123..."},
{"name": "stellaops:file-hash", "value": "sha256:..."}
]
}
```
### 4.2 SPDX 3.0 Integration
Build-ID maps to SPDX external references:
```json
{
"spdxId": "SPDXRef-libssl",
"externalRef": {
"referenceCategory": "PERSISTENT-ID",
"referenceType": "gnu-build-id",
"referenceLocator": "gnu-build-id:5f0c7c3c..."
}
}
```
---
## 5. Signals Runtime Facts Schema
### 5.1 Runtime Event with Build-ID
```json
{
"event_type": "function_hit",
"timestamp": "2025-12-13T10:00:00Z",
"binary": {
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
"build_id": "gnu-build-id:5f0c7c3c...",
"file_hash": "sha256:..."
},
"symbol": {
"name": "SSL_read",
"address": "0x12345678",
"symbol_id": "sym:binary:..."
},
"context": {
"pid": 12345,
"container_id": "abc123..."
}
}
```
### 5.2 Ingestion Endpoint
```
POST /signals/runtime-facts
Content-Type: application/x-ndjson
Content-Encoding: gzip
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
```
---
## 6. RichGraph Integration
### 6.1 Node with Build-ID
```json
{
"id": "sym:binary:...",
"symbol_id": "sym:binary:...",
"lang": "binary",
"kind": "function",
"display": "SSL_read",
"build_id": "gnu-build-id:5f0c7c3c...",
"code_id": "code:binary:...",
"code_block_hash": "sha256:...",
"purl": "pkg:deb/debian/libssl3@3.0.11"
}
```
### 6.2 CAS Evidence Storage
```
cas://binary/
by-build-id/{build_id}/ # Index by build-id
graph.json # Associated graph
symbols.json # Symbol table
by-code-id/{code_id}/ # Index by code-id
block.bin # Code block bytes
disasm.json # Disassembly
```
---
## 7. Implementation Requirements
### 7.1 Scanner Changes
| Component | Change | Priority |
|-----------|--------|----------|
| ELF parser | Extract `.note.gnu.build-id` | P0 |
| PE parser | Extract Debug GUID | P0 |
| Mach-O parser | Extract `LC_UUID` | P0 |
| RichGraphBuilder | Populate `build_id` field on nodes | P0 |
| SBOM emitters | Add `stellaops:build-id` property | P1 |
### 7.2 Signals Changes
| Component | Change | Priority |
|-----------|--------|----------|
| Runtime facts ingestion | Parse and index `build_id` | P0 |
| Scoring service | Correlate by `build_id` then `code_id` | P0 |
| Store repository | Add `build_id` index | P1 |
### 7.3 CLI/UI Changes
| Component | Change | Priority |
|-----------|--------|----------|
| `stella graph explain` | Show build_id in output | P1 |
| UI symbol drawer | Display build_id with copy button | P1 |
---
## 8. Validation Rules
1. `build_id` must match regex: `^(gnu-build-id|pe-guid|macho-uuid):[a-f0-9-]+$`
2. `code_id` must match regex: `^code:[a-z]+:[A-Za-z0-9_-]+$`
3. When `build_id` is null, `build_id_fallback` must be present
4. `code_block_hash` required when `build_id` is null and symbol is stripped
5. Variant group `source_digest` must be consistent across all variants
---
## 9. Test Fixtures
Location: `tests/Binary/fixtures/build-id/`
| Fixture | Description |
|---------|-------------|
| `elf-with-buildid/` | ELF binary with GNU build-id |
| `elf-stripped/` | ELF stripped, fallback to code-id |
| `pe-with-guid/` | PE binary with Debug GUID |
| `macho-with-uuid/` | Mach-O binary with LC_UUID |
| `variant-group/` | Same source, multiple RIDs |
---
## 10. Related Contracts
- [richgraph-v1](./richgraph-v1.md) - Graph schema with build_id field
- [Binary Reachability](../reachability/binary-reachability-schema.md) - Binary evidence schema
- [Symbol Manifest](../specs/SYMBOL_MANIFEST_v1.md) - Symbol identification
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial contract for build-id propagation |