Files
git.stella-ops.org/docs/contracts/buildid-propagation.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

7.5 KiB

CONTRACT-BUILDID-PROPAGATION-401: Build-ID and Code-ID Propagation

Status: Published Version: 1.0.0 Published: 2025-12-13 Owners: Scanner Guild, Signals Guild, BE-Base Platform Guild Unblocks: SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and downstream tasks

Overview

This contract defines how GNU build-id (ELF), PE GUID, and Mach-O UUID propagate through the reachability pipeline from Scanner to SBOM, Signals, and runtime facts. It ensures consistent identification of binaries across components for deterministic symbol resolution and replay.


1. Build-ID Sources and Formats

1.1 Per-Format Extraction

Binary Format Build-ID Source Prefix Example
ELF .note.gnu.build-id gnu-build-id: gnu-build-id:5f0c7c3cab2eb9bc...
PE (Windows) Debug GUID from PE header pe-guid: pe-guid:12345678-1234-1234-1234-123456789abc
Mach-O LC_UUID load command macho-uuid: macho-uuid:12345678123412341234123456789abc

1.2 Canonical Format

build_id = "{prefix}{hex_lowercase}"
  • Hex encoding: lowercase, no separators (except PE GUID retains dashes)
  • Minimum length: 16 bytes (32 hex chars) for ELF/Mach-O
  • PE GUID: Standard GUID format with dashes

1.3 Fallback When Build-ID Absent

When build-id is not present (stripped binaries, older toolchains):

{
  "build_id": null,
  "build_id_fallback": {
    "method": "file_hash",
    "value": "sha256:...",
    "confidence": 0.7
  }
}

Fallback chain:

  1. file_hash - SHA-256 of entire binary file (confidence: 0.7)
  2. code_section_hash - SHA-256 of .text section (confidence: 0.6)
  3. path_hash - SHA-256 of file path (confidence: 0.3, last resort)

2. Code-ID for Name-less Symbols

2.1 Purpose

code_id provides stable identification for symbols in stripped binaries where the symbol name is unavailable.

2.2 Format

code_id = "code:{lang}:{base64url_sha256}"

Canonical tuple for binary symbols:

{format}\0{build_id_or_file_hash}\0{section}\0{addr}\0{size}\0{code_block_hash}

2.3 Code Block Hash

For stripped functions, compute hash of the code bytes:

code_block_hash = "sha256:" + hex(SHA256(code_bytes[addr:addr+size]))

3. Cross-RID (Runtime Identifier) Mapping

3.1 Problem Statement

Different platform builds (linux-x64, win-x64, osx-arm64) of the same source code produce different binaries with different build-ids. Runtime facts from one platform must map to the correct binary variant.

3.2 Variant Group

Binaries from the same source are grouped by source digest:

{
  "variant_group": {
    "source_digest": "sha256:...",
    "variants": [
      {
        "rid": "linux-x64",
        "build_id": "gnu-build-id:aaa...",
        "file_hash": "sha256:..."
      },
      {
        "rid": "win-x64",
        "build_id": "pe-guid:bbb...",
        "file_hash": "sha256:..."
      },
      {
        "rid": "osx-arm64",
        "build_id": "macho-uuid:ccc...",
        "file_hash": "sha256:..."
      }
    ]
  }
}

3.3 Runtime Fact Correlation

When Signals ingests runtime facts:

  1. Extract build_id from runtime event
  2. Look up variant group containing this build_id
  3. Correlate with richgraph nodes having matching build_id
  4. If no match, fall back to code_id + code_block_hash matching

4. SBOM Integration

4.1 CycloneDX 1.6 Properties

Build-ID propagates to SBOM via component properties:

{
  "type": "library",
  "name": "libssl.so.3",
  "version": "3.0.11",
  "properties": [
    {"name": "stellaops:build-id", "value": "gnu-build-id:5f0c7c3c..."},
    {"name": "stellaops:code-id", "value": "code:binary:abc123..."},
    {"name": "stellaops:file-hash", "value": "sha256:..."}
  ]
}

4.2 SPDX 3.0 Integration

Build-ID maps to SPDX external references:

{
  "spdxId": "SPDXRef-libssl",
  "externalRef": {
    "referenceCategory": "PERSISTENT-ID",
    "referenceType": "gnu-build-id",
    "referenceLocator": "gnu-build-id:5f0c7c3c..."
  }
}

5. Signals Runtime Facts Schema

5.1 Runtime Event with Build-ID

{
  "event_type": "function_hit",
  "timestamp": "2025-12-13T10:00:00Z",
  "binary": {
    "path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
    "build_id": "gnu-build-id:5f0c7c3c...",
    "file_hash": "sha256:..."
  },
  "symbol": {
    "name": "SSL_read",
    "address": "0x12345678",
    "symbol_id": "sym:binary:..."
  },
  "context": {
    "pid": 12345,
    "container_id": "abc123..."
  }
}

5.2 Ingestion Endpoint

POST /signals/runtime-facts
Content-Type: application/x-ndjson
Content-Encoding: gzip

{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}

6. RichGraph Integration

6.1 Node with Build-ID

{
  "id": "sym:binary:...",
  "symbol_id": "sym:binary:...",
  "lang": "binary",
  "kind": "function",
  "display": "SSL_read",
  "build_id": "gnu-build-id:5f0c7c3c...",
  "code_id": "code:binary:...",
  "code_block_hash": "sha256:...",
  "purl": "pkg:deb/debian/libssl3@3.0.11"
}

6.2 CAS Evidence Storage

cas://binary/
  by-build-id/{build_id}/       # Index by build-id
    graph.json                   # Associated graph
    symbols.json                 # Symbol table
  by-code-id/{code_id}/          # Index by code-id
    block.bin                    # Code block bytes
    disasm.json                  # Disassembly

7. Implementation Requirements

7.1 Scanner Changes

Component Change Priority
ELF parser Extract .note.gnu.build-id P0
PE parser Extract Debug GUID P0
Mach-O parser Extract LC_UUID P0
RichGraphBuilder Populate build_id field on nodes P0
SBOM emitters Add stellaops:build-id property P1

7.2 Signals Changes

Component Change Priority
Runtime facts ingestion Parse and index build_id P0
Scoring service Correlate by build_id then code_id P0
Store repository Add build_id index P1

7.3 CLI/UI Changes

Component Change Priority
stella graph explain Show build_id in output P1
UI symbol drawer Display build_id with copy button P1

8. Validation Rules

  1. build_id must match regex: ^(gnu-build-id|pe-guid|macho-uuid):[a-f0-9-]+$
  2. code_id must match regex: ^code:[a-z]+:[A-Za-z0-9_-]+$
  3. When build_id is null, build_id_fallback must be present
  4. code_block_hash required when build_id is null and symbol is stripped
  5. Variant group source_digest must be consistent across all variants

9. Test Fixtures

Location: tests/Binary/fixtures/build-id/

Fixture Description
elf-with-buildid/ ELF binary with GNU build-id
elf-stripped/ ELF stripped, fallback to code-id
pe-with-guid/ PE binary with Debug GUID
macho-with-uuid/ Mach-O binary with LC_UUID
variant-group/ Same source, multiple RIDs


Changelog

Version Date Author Changes
1.0.0 2025-12-13 Scanner Guild Initial contract for build-id propagation