sprints work.

This commit is contained in:
master
2026-01-20 00:45:38 +02:00
parent b34bde89fa
commit 4903395618
275 changed files with 52785 additions and 79 deletions

View File

@@ -0,0 +1,164 @@
# DeltaSig v2 Predicate Schema
> **Sprint**: SPRINT_20260119_004_BinaryIndex_deltasig_extensions
> **Status**: Implemented
## Overview
DeltaSig v2 extends the function-level binary diff predicate with:
- **Symbol Provenance**: Links function matches to ground-truth corpus sources (debuginfod, ddeb, buildinfo, secdb)
- **IR Diff References**: CAS-stored intermediate representation diffs for detailed analysis
- **Explicit Verdicts**: Clear vulnerability status with confidence scores
- **Function Match States**: Per-function vulnerable/patched/modified/unchanged classification
## Schema
**Predicate Type URI**: `https://stella-ops.org/predicates/deltasig/v2`
### Key Fields
| Field | Type | Description |
|-------|------|-------------|
| `schemaVersion` | string | Always `"2.0.0"` |
| `subject` | object | Single subject (PURL, digest, arch) |
| `functionMatches` | array | Function-level matches with evidence |
| `verdict` | string | `vulnerable`, `patched`, `partial`, `unknown`, `partially_patched`, `inconclusive` |
| `confidence` | number | 0.0-1.0 confidence score |
| `summary` | object | Aggregate statistics |
### Function Match
```json
{
"functionId": "sha256:abc123...",
"name": "ssl_handshake",
"address": 4194304,
"size": 256,
"matchScore": 0.95,
"matchMethod": "semantic_ksg",
"matchState": "patched",
"symbolProvenance": {
"sourceId": "fedora-debuginfod",
"observationId": "obs:gt:12345",
"confidence": 0.98,
"resolvedAt": "2026-01-19T12:00:00Z"
},
"irDiff": {
"casDigest": "sha256:def456...",
"statementsAdded": 5,
"statementsRemoved": 3,
"changedInstructions": 8
}
}
```
### Summary
```json
{
"totalFunctions": 150,
"vulnerableFunctions": 0,
"patchedFunctions": 12,
"unknownFunctions": 138,
"functionsWithProvenance": 45,
"functionsWithIrDiff": 12,
"avgMatchScore": 0.85,
"minMatchScore": 0.42,
"maxMatchScore": 0.99,
"totalIrDiffSize": 1234
}
```
## Version Negotiation
Clients can request specific predicate versions:
```json
{
"preferredVersion": "2",
"requiredFeatures": ["provenance", "ir-diff"]
}
```
Response:
```json
{
"version": "2.0.0",
"predicateType": "https://stella-ops.org/predicates/deltasig/v2",
"features": ["provenance", "ir-diff"]
}
```
## VEX Integration
DeltaSig v2 predicates can be converted to VEX observations via `IDeltaSigVexBridge`:
| DeltaSig Verdict | VEX Status |
|------------------|------------|
| `patched` | `fixed` |
| `vulnerable` | `affected` |
| `partially_patched` | `under_investigation` |
| `inconclusive` | `under_investigation` |
| `unknown` | `not_affected` (conservative) |
### Evidence Blocks
VEX observations include evidence blocks:
1. **deltasig-summary**: Aggregate statistics
2. **deltasig-function-matches**: High-confidence matches with provenance
3. **deltasig-predicate-ref**: Reference to full predicate
## Implementation
### Core Services
| Interface | Implementation | Description |
|-----------|----------------|-------------|
| `IDeltaSigServiceV2` | `DeltaSigServiceV2` | V2 predicate generation |
| `ISymbolProvenanceResolver` | `GroundTruthProvenanceResolver` | Ground-truth lookup |
| `IIrDiffGenerator` | `IrDiffGenerator` | IR diff generation with CAS |
| `IDeltaSigVexBridge` | `DeltaSigVexBridge` | VEX observation generation |
### DI Registration
```csharp
services.AddDeltaSigV2();
```
Or with options:
```csharp
services.AddDeltaSigV2(
configureProvenance: opts => opts.IncludeStale = false,
configureIrDiff: opts => opts.MaxParallelism = 4
);
```
## Migration from v1
Use `DeltaSigPredicateConverter`:
```csharp
// v1 → v2
var v2 = DeltaSigPredicateConverter.ToV2(v1Predicate);
// v2 → v1
var v1 = DeltaSigPredicateConverter.ToV1(v2Predicate);
```
Notes:
- v1 → v2: Provenance and IR diff will be empty (add via resolver/generator)
- v2 → v1: Provenance and IR diff are discarded; verdict/confidence are lost
## JSON Schema
Full schema: [`docs/schemas/predicates/deltasig-v2.schema.json`](../../../schemas/predicates/deltasig-v2.schema.json)
## Related Documentation
- [Ground-Truth Corpus](./ground-truth-corpus.md)
- [Semantic Diffing](./semantic-diffing.md)
- [Architecture](./architecture.md)

View File

@@ -0,0 +1,764 @@
# Ground-Truth Corpus Architecture
> **Ownership:** BinaryIndex Guild
> **Status:** DRAFT
> **Version:** 1.0.0
> **Related:** [BinaryIndex Architecture](architecture.md), [Corpus Management](corpus-management.md), [Concelier AOC](../concelier/guides/aggregation-only-contract.md)
---
## 1. Overview
The **Ground-Truth Corpus** system provides a validated function-matching oracle for binary diff accuracy measurement. It uses the same plugin-based ingestion pattern as Concelier (advisories) and Excititor (VEX), applying **Aggregation-Only Contract (AOC)** principles to ensure immutable, deterministic, and replayable data.
### 1.1 Problem Statement
Function matching and binary diffing require ground-truth data to measure accuracy:
1. **No oracle for validation** - How do we know a function match is correct?
2. **Symbols stripped in production** - Debug info unavailable at scan time
3. **Compiler/optimization variance** - Same source produces different binaries
4. **Backport detection gaps** - Need pre/post pairs to validate patch detection
### 1.2 Solution: Distro Symbol Corpus
Leverage mainstream Linux distro artifacts as ground-truth:
| Source | What It Provides | Use Case |
|--------|------------------|----------|
| **Debian `.buildinfo`** | Exact build env records, often clearsigned | Reproducible oracle, build env metadata |
| **Fedora Koji + debuginfod** | Machine-queryable debuginfo with IMA verification | Symbol recovery for stripped binaries |
| **Ubuntu ddebs** | Debug symbol packages | Symbol-grounded truth for function names |
| **Alpine SecDB** | Precise CVE-to-backport mappings | Pre/post pair curation |
### 1.3 Module Scope
**In Scope:**
- Symbol recovery connectors (debuginfod, ddebs, .buildinfo)
- Ground-truth observations (immutable, append-only)
- Pre/post security pair curation
- Validation harness for function-matching accuracy
- Deterministic manifests for replayability
**Out of Scope:**
- Function matching algorithms (see [semantic-diffing.md](semantic-diffing.md))
- Fingerprint generation (see [corpus-management.md](corpus-management.md))
- Policy decisions (provided by Policy Engine)
---
## 2. Architecture
### 2.1 System Context
```
┌──────────────────────────────────────────────────────────────────────────┐
│ External Symbol Sources │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Fedora │ │ Ubuntu │ │ Debian │ │
│ │ debuginfod │ │ ddebs │ │ .buildinfo │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ ┌────────┴────────┐ ┌────────┴────────┐ ┌───────┴─────────┐ │
│ │ Alpine SecDB │ │ reproduce. │ │ Upstream │ │
│ │ │ │ debian.net │ │ tarballs │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
└───────────│─────────────────────│─────────────────────│──────────────────┘
│ │ │
v v v
┌──────────────────────────────────────────────────────────────────────────┐
│ Ground-Truth Corpus Module │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Symbol Source Connectors │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Debuginfod │ │ Ddeb │ │ Buildinfo │ │ │
│ │ │ Connector │ │ Connector │ │ Connector │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ SecDB │ │ Upstream │ │ │
│ │ │ Connector │ │ Connector │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ v │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AOC Write Guard Layer │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ • No derived scores at ingest │ │ │
│ │ │ • Immutable observations + supersedes chain │ │ │
│ │ │ • Mandatory provenance (source URL, hash, signature) │ │ │
│ │ │ • Idempotent upserts (keyed by content hash) │ │ │
│ │ │ • Deterministic canonical JSON │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ v │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Storage Layer (PostgreSQL) │ │
│ │ │ │
│ │ groundtruth.symbol_sources - Registered symbol providers │ │
│ │ groundtruth.raw_documents - Immutable raw payloads │ │
│ │ groundtruth.symbol_observations- Normalized symbol records │ │
│ │ groundtruth.security_pairs - Pre/post CVE binary pairs │ │
│ │ groundtruth.validation_runs - Benchmark execution records │ │
│ │ groundtruth.match_results - Function match outcomes │ │
│ │ groundtruth.source_state - Cursor/sync state per source │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ v │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Validation Harness │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ IValidationHarness │ │ │
│ │ │ - RunValidationAsync(pairs, matcherConfig) │ │ │
│ │ │ - GetMetricsAsync(runId) -> MatchRate, FP/FN, Unmatched │ │ │
│ │ │ - ExportReportAsync(runId, format) -> Markdown/HTML │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
```
### 2.2 Component Breakdown
#### 2.2.1 Symbol Source Connectors
Plugin-based connectors following the Concelier `IFeedConnector` pattern:
```csharp
public interface ISymbolSourceConnector
{
string SourceId { get; }
string[] SupportedDistros { get; }
// Three-phase pipeline (matches Concelier pattern)
Task FetchAsync(IServiceProvider sp, CancellationToken ct); // Download raw docs
Task ParseAsync(IServiceProvider sp, CancellationToken ct); // Normalize to DTOs
Task MapAsync(IServiceProvider sp, CancellationToken ct); // Build observations
}
```
**Implementations:**
| Connector | Source | Data Retrieved |
|-----------|--------|----------------|
| `DebuginfodConnector` | Fedora/RHEL debuginfod | ELF debuginfo, source files |
| `DdebConnector` | Ubuntu ddebs repos | .ddeb packages with DWARF |
| `BuildinfoConnector` | Debian .buildinfo | Build env, checksums, signatures |
| `SecDbConnector` | Alpine SecDB | CVE-to-fix mappings |
| `UpstreamConnector` | GitHub/tarballs | Upstream release sources |
#### 2.2.2 AOC Write Guard
Enforces aggregation-only invariants (mirrors `IAdvisoryObservationWriteGuard`):
```csharp
public interface ISymbolObservationWriteGuard
{
WriteDisposition ValidateWrite(
SymbolObservation candidate,
string? existingContentHash);
}
public enum WriteDisposition
{
Proceed, // Insert new observation
SkipIdentical, // Idempotent re-insert, no-op
RejectMutation // Reject (append-only violation)
}
```
**Invariants Enforced:**
| Invariant | What It Forbids |
|-----------|-----------------|
| No derived scores | Reject `confidence`, `accuracy`, `match_score` at ingest |
| Immutable observations | No in-place updates; new revisions use `supersedes` |
| Mandatory provenance | Require `source_url`, `fetched_at`, `content_hash`, `signature_state` |
| Idempotent upserts | Key by `(source_id, debug_id, content_hash)` |
| Deterministic canonical | Sorted JSON keys, UTC ISO-8601, stable hashes |
#### 2.2.3 Security Pair Curation
Manages pre/post CVE binary pairs for validation:
```csharp
public interface ISecurityPairService
{
// Curate a pre/post pair for a CVE
Task<SecurityPair> CreatePairAsync(
string cveId,
BinaryReference vulnerableBinary,
BinaryReference patchedBinary,
PairMetadata metadata,
CancellationToken ct);
// Get pairs for validation
Task<ImmutableArray<SecurityPair>> GetPairsAsync(
SecurityPairQuery query,
CancellationToken ct);
}
public sealed record SecurityPair(
string PairId,
string CveId,
BinaryReference VulnerableBinary,
BinaryReference PatchedBinary,
string[] AffectedFunctions, // Symbol names of vulnerable functions
string[] ChangedFunctions, // Symbol names of patched functions
DiffMetadata Diff, // Upstream patch info
ProvenanceInfo Provenance);
```
#### 2.2.4 Validation Harness
Runs function-matching validation with metrics:
```csharp
public interface IValidationHarness
{
// Execute validation run
Task<ValidationRun> RunAsync(
ValidationConfig config,
CancellationToken ct);
// Get metrics for a run
Task<ValidationMetrics> GetMetricsAsync(
Guid runId,
CancellationToken ct);
// Export report
Task<Stream> ExportReportAsync(
Guid runId,
ReportFormat format,
CancellationToken ct);
}
public sealed record ValidationMetrics(
int TotalFunctions,
int CorrectMatches,
int FalsePositives,
int FalseNegatives,
int Unmatched,
decimal MatchRate,
decimal Precision,
decimal Recall,
ImmutableArray<MismatchBucket> MismatchBuckets);
public sealed record MismatchBucket(
string Cause, // inlining, lto, optimization, pic_thunk
int Count,
ImmutableArray<FunctionRef> Examples);
```
---
## 3. Database Schema
### 3.1 Symbol Sources
```sql
CREATE TABLE groundtruth.symbol_sources (
source_id TEXT PRIMARY KEY,
display_name TEXT NOT NULL,
connector_type TEXT NOT NULL, -- debuginfod, ddeb, buildinfo, secdb
base_url TEXT NOT NULL,
enabled BOOLEAN DEFAULT TRUE,
config_json JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
```
### 3.2 Raw Documents (Immutable)
```sql
CREATE TABLE groundtruth.raw_documents (
digest TEXT PRIMARY KEY, -- sha256:{hex}
source_id TEXT NOT NULL REFERENCES groundtruth.symbol_sources(source_id),
document_uri TEXT NOT NULL,
fetched_at TIMESTAMPTZ NOT NULL,
recorded_at TIMESTAMPTZ DEFAULT NOW(),
content_type TEXT NOT NULL,
content_size_bytes INT,
etag TEXT,
signature_state TEXT, -- verified, unverified, failed
payload_json JSONB,
UNIQUE (source_id, document_uri, etag)
);
CREATE INDEX idx_raw_documents_source_fetched
ON groundtruth.raw_documents(source_id, fetched_at DESC);
```
### 3.3 Symbol Observations (Immutable)
```sql
CREATE TABLE groundtruth.symbol_observations (
observation_id TEXT PRIMARY KEY, -- groundtruth:{source}:{debug_id}:{revision}
source_id TEXT NOT NULL,
debug_id TEXT NOT NULL, -- ELF build-id, PE GUID, Mach-O UUID
code_id TEXT, -- GNU build-id or PE checksum
-- Binary metadata
binary_name TEXT NOT NULL,
binary_path TEXT,
architecture TEXT NOT NULL, -- x86_64, aarch64, armv7
-- Package provenance
distro TEXT, -- debian, ubuntu, fedora, alpine
distro_version TEXT,
package_name TEXT,
package_version TEXT,
-- Symbols
symbols_json JSONB NOT NULL, -- Array of {name, address, size, type}
symbol_count INT NOT NULL,
-- Build metadata (from .buildinfo or debuginfo)
compiler TEXT,
compiler_version TEXT,
optimization_level TEXT,
build_flags_json JSONB,
-- Provenance
document_digest TEXT REFERENCES groundtruth.raw_documents(digest),
content_hash TEXT NOT NULL,
supersedes_id TEXT REFERENCES groundtruth.symbol_observations(observation_id),
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE (source_id, debug_id, content_hash)
);
CREATE INDEX idx_symbol_observations_debug_id
ON groundtruth.symbol_observations(debug_id);
CREATE INDEX idx_symbol_observations_package
ON groundtruth.symbol_observations(distro, package_name, package_version);
```
### 3.4 Security Pairs
```sql
CREATE TABLE groundtruth.security_pairs (
pair_id TEXT PRIMARY KEY,
cve_id TEXT NOT NULL,
-- Vulnerable binary
vuln_observation_id TEXT NOT NULL
REFERENCES groundtruth.symbol_observations(observation_id),
vuln_debug_id TEXT NOT NULL,
-- Patched binary
patch_observation_id TEXT NOT NULL
REFERENCES groundtruth.symbol_observations(observation_id),
patch_debug_id TEXT NOT NULL,
-- Affected function mapping
affected_functions_json JSONB NOT NULL, -- [{name, vuln_addr, patch_addr}]
changed_functions_json JSONB NOT NULL,
-- Upstream diff reference
upstream_commit TEXT,
upstream_patch_url TEXT,
-- Metadata
distro TEXT NOT NULL,
package_name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
created_by TEXT
);
CREATE INDEX idx_security_pairs_cve
ON groundtruth.security_pairs(cve_id);
CREATE INDEX idx_security_pairs_package
ON groundtruth.security_pairs(distro, package_name);
```
### 3.5 Validation Runs
```sql
CREATE TABLE groundtruth.validation_runs (
run_id UUID PRIMARY KEY,
config_json JSONB NOT NULL, -- Matcher config, thresholds
started_at TIMESTAMPTZ NOT NULL,
completed_at TIMESTAMPTZ,
status TEXT NOT NULL, -- running, completed, failed
-- Aggregate metrics
total_functions INT,
correct_matches INT,
false_positives INT,
false_negatives INT,
unmatched INT,
match_rate DECIMAL(5,4),
precision DECIMAL(5,4),
recall DECIMAL(5,4),
-- Environment
matcher_version TEXT NOT NULL,
corpus_snapshot_id TEXT,
created_by TEXT
);
CREATE TABLE groundtruth.match_results (
result_id UUID PRIMARY KEY,
run_id UUID NOT NULL REFERENCES groundtruth.validation_runs(run_id),
-- Ground truth
pair_id TEXT NOT NULL REFERENCES groundtruth.security_pairs(pair_id),
function_name TEXT NOT NULL,
expected_match BOOLEAN NOT NULL,
-- Actual result
actual_match BOOLEAN,
match_score DECIMAL(5,4),
matched_function TEXT,
-- Classification
outcome TEXT NOT NULL, -- true_positive, false_positive, false_negative, unmatched
mismatch_cause TEXT, -- inlining, lto, optimization, pic_thunk, etc.
-- Debug info
debug_json JSONB
);
CREATE INDEX idx_match_results_run
ON groundtruth.match_results(run_id);
CREATE INDEX idx_match_results_outcome
ON groundtruth.match_results(run_id, outcome);
```
### 3.6 Source State (Cursor Tracking)
```sql
CREATE TABLE groundtruth.source_state (
source_id TEXT PRIMARY KEY REFERENCES groundtruth.symbol_sources(source_id),
enabled BOOLEAN DEFAULT TRUE,
cursor_json JSONB, -- last_modified, last_id, pending_docs
last_success_at TIMESTAMPTZ,
last_error TEXT,
backoff_until TIMESTAMPTZ
);
```
---
## 4. Connector Specifications
### 4.1 Debuginfod Connector (Fedora/RHEL)
**Data Source:** `https://debuginfod.fedoraproject.org`
**Fetch Flow:**
1. Query debuginfod for build-id: `GET /buildid/{build_id}/debuginfo`
2. Retrieve DWARF sections (.debug_info, .debug_line)
3. Parse symbols using libdw
4. Store observation with IMA signature verification
**Configuration:**
```yaml
debuginfod:
base_url: "https://debuginfod.fedoraproject.org"
timeout_seconds: 30
verify_ima: true
cache_dir: "/var/cache/stellaops/debuginfod"
```
### 4.2 Ddeb Connector (Ubuntu)
**Data Source:** `http://ddebs.ubuntu.com`
**Fetch Flow:**
1. Query Packages index for `-dbgsym` packages
2. Download `.ddeb` archive
3. Extract DWARF from `/usr/lib/debug/.build-id/`
4. Parse symbols, map to corresponding binary package
**Configuration:**
```yaml
ddeb:
mirror_url: "http://ddebs.ubuntu.com"
distributions: ["focal", "jammy", "noble"]
components: ["main", "universe"]
cache_dir: "/var/cache/stellaops/ddebs"
```
### 4.3 Buildinfo Connector (Debian)
**Data Source:** `https://buildinfos.debian.net`
**Fetch Flow:**
1. Query buildinfo index for package
2. Download `.buildinfo` file (often clearsigned)
3. Parse build environment (compiler, flags, checksums)
4. Cross-reference with snapshot.debian.org for exact binary
**Configuration:**
```yaml
buildinfo:
index_url: "https://buildinfos.debian.net"
snapshot_url: "https://snapshot.debian.org"
reproducible_url: "https://reproduce.debian.net"
verify_signature: true
```
### 4.4 SecDB Connector (Alpine)
**Data Source:** `https://github.com/alpinelinux/alpine-secdb`
**Fetch Flow:**
1. Clone/pull secdb repository
2. Parse YAML files per branch (v3.18, v3.19, edge)
3. Map CVE to fixed/unfixed package versions
4. Cross-reference with aports for patch info
**Configuration:**
```yaml
secdb:
repo_url: "https://github.com/alpinelinux/alpine-secdb"
branches: ["v3.18", "v3.19", "v3.20", "edge"]
aports_url: "https://gitlab.alpinelinux.org/alpine/aports"
```
---
## 5. Validation Pipeline
### 5.1 Harness Workflow
```
1. Assemble
└─> Given package + CVE, fetch: binaries, debuginfo, .buildinfo, upstream tarball
2. Recover Symbols
└─> Resolve build-id → symbols via debuginfod/ddebs
└─> Fallback: Debian rebuild from .buildinfo
3. Lift Functions
└─> Batch-lift .text functions → IR
└─> Cache per build-id
4. Fingerprint
└─> Emit deterministic + fuzzy signatures
└─> Store as JSON lines
5. Match
└─> Pre→post function matching
└─> Write row per function with scores
6. Score
└─> Compute metrics (match rate, FP/FN, precision, recall)
└─> Bucket mismatches by cause
7. Report
└─> Markdown/HTML with tables + diffs
└─> Attach env hashes and artifact URLs
```
### 5.2 Metrics Tracked
| Metric | Description |
|--------|-------------|
| `match_rate` | Correct matches / total functions |
| `precision` | True positives / (true positives + false positives) |
| `recall` | True positives / (true positives + false negatives) |
| `unmatched_rate` | Unmatched / total functions |
### 5.3 Mismatch Buckets
| Cause | Description | Mitigation |
|-------|-------------|------------|
| `inlining` | Function inlined, no direct match | Inline expansion in fingerprint |
| `lto` | Link-time optimization changed structure | Cross-module fingerprints |
| `optimization` | Different -O level | Semantic fingerprints |
| `pic_thunk` | Position-independent code stubs | Filter PIC thunks |
| `versioned_symbol` | GLIBC symbol versioning | Version-aware matching |
| `renamed` | Symbol renamed (macro, alias) | Alias resolution |
---
## 6. Evidence Objects
### 6.1 Ground-Truth Attestation Predicate
```json
{
"predicateType": "https://stella-ops.org/predicates/groundtruth/v1",
"predicate": {
"observationId": "groundtruth:debuginfod:abc123def456:1",
"debugId": "abc123def456789...",
"binaryIdentity": {
"name": "libssl.so.3",
"sha256": "sha256:...",
"architecture": "x86_64"
},
"symbolSource": {
"sourceId": "debuginfod-fedora",
"fetchedAt": "2026-01-19T10:00:00Z",
"documentUri": "https://debuginfod.fedoraproject.org/buildid/abc123/debuginfo",
"signatureState": "verified"
},
"symbols": [
{"name": "SSL_CTX_new", "address": "0x1234", "size": 256},
{"name": "SSL_read", "address": "0x5678", "size": 512}
],
"buildMetadata": {
"compiler": "gcc",
"compilerVersion": "12.2.0",
"optimizationLevel": "O2",
"buildFlags": ["-fstack-protector-strong", "-D_FORTIFY_SOURCE=2"]
}
}
}
```
### 6.2 Validation Run Attestation
```json
{
"predicateType": "https://stella-ops.org/predicates/validation-run/v1",
"predicate": {
"runId": "550e8400-e29b-41d4-a716-446655440000",
"config": {
"matcherVersion": "binaryindex-semantic-diffing:1.2.0",
"thresholds": {
"minSimilarity": 0.85,
"semanticWeight": 0.35,
"instructionWeight": 0.25
}
},
"corpus": {
"snapshotId": "corpus:2026-01-19",
"functionCount": 30000,
"libraryCount": 5
},
"metrics": {
"totalFunctions": 1500,
"correctMatches": 1380,
"falsePositives": 15,
"falseNegatives": 45,
"unmatched": 60,
"matchRate": 0.92,
"precision": 0.989,
"recall": 0.968
},
"mismatchBuckets": [
{"cause": "inlining", "count": 25},
{"cause": "lto", "count": 12},
{"cause": "optimization", "count": 8}
],
"executedAt": "2026-01-19T10:30:00Z"
}
}
```
---
## 7. CLI Commands
```bash
# Symbol source management
stella groundtruth sources list
stella groundtruth sources enable debuginfod-fedora
stella groundtruth sources sync --source debuginfod-fedora
# Symbol observation queries
stella groundtruth symbols lookup --debug-id abc123
stella groundtruth symbols search --package openssl --distro debian
# Security pair management
stella groundtruth pairs create \
--cve CVE-2024-1234 \
--vuln-pkg openssl=3.0.10-1 \
--patch-pkg openssl=3.0.11-1
stella groundtruth pairs list --cve CVE-2024-1234
# Validation harness
stella groundtruth validate run \
--pairs "openssl:CVE-2024-*" \
--matcher semantic-diffing \
--output validation-report.md
stella groundtruth validate metrics --run-id abc123
stella groundtruth validate export --run-id abc123 --format html
```
---
## 8. Doctor Checks
The ground-truth corpus integrates with Doctor for availability checks:
```csharp
// stellaops.doctor.binaryanalysis plugin
public sealed class BinaryAnalysisDoctorPlugin : IDoctorPlugin
{
public string Name => "stellaops.doctor.binaryanalysis";
public IEnumerable<IDoctorCheck> GetChecks()
{
yield return new DebuginfodAvailabilityCheck();
yield return new DdebRepoEnabledCheck();
yield return new BuildinfoCacheCheck();
yield return new SymbolRecoveryFallbackCheck();
}
}
```
| Check | Description | Remediation |
|-------|-------------|-------------|
| `debuginfod_urls_configured` | Verify `DEBUGINFOD_URLS` env | Set env variable |
| `ddeb_repos_enabled` | Check Ubuntu ddeb sources | Enable ddebs repo |
| `buildinfo_cache_accessible` | Validate buildinfos.debian.net | Check network/firewall |
| `symbol_recovery_fallback` | Ensure fallback path works | Configure local cache |
---
## 9. Air-Gap Support
For offline/air-gapped deployments:
### 9.1 Symbol Bundle Format
```
symbol-bundle-2026-01-19/
├── manifest.json # Bundle metadata + checksums
├── sources/
│ ├── debuginfod/
│ │ └── *.debuginfo # Pre-fetched debuginfo
│ ├── ddebs/
│ │ └── *.ddeb # Pre-fetched ddebs
│ └── buildinfo/
│ └── *.buildinfo # Pre-fetched buildinfo
├── observations/
│ └── *.ndjson # Pre-computed observations
└── DSSE.envelope # Signed attestation
```
### 9.2 Offline Sync
```bash
# Export bundle for air-gap transfer
stella groundtruth bundle export \
--packages openssl,zlib,glibc \
--distros debian,fedora \
--output symbol-bundle.tar.gz
# Import bundle in air-gapped environment
stella groundtruth bundle import \
--input symbol-bundle.tar.gz \
--verify-signature
```
---
## 10. Related Documentation
- [BinaryIndex Architecture](architecture.md)
- [Semantic Diffing](semantic-diffing.md)
- [Corpus Management](corpus-management.md)
- [Concelier AOC](../concelier/guides/aggregation-only-contract.md)
- [Excititor Architecture](../excititor/architecture.md)