Sprints completed: - SPRINT_20260110_012_* (golden set diff layer - 10 sprints) - SPRINT_20260110_013_* (advisory chat - 4 sprints) Build fixes applied: - Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create - Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite) - Fix VexSchemaValidationTests FluentAssertions method name - Fix FixChainGateIntegrationTests ambiguous type references - Fix AdvisoryAI test files required properties and namespace aliases - Add stub types for CveMappingController (ICveSymbolMappingService) - Fix VerdictBuilderService static context issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
11 KiB
Golden Set Schema Documentation
Version: 1.0.0
Module: BinaryIndex.GoldenSet
Last Updated: 2026-01-10
Overview
Golden Sets are ground-truth definitions of vulnerability code-level manifestations. They capture the specific functions, basic block edges, sinks, and constants that characterize a vulnerability, enabling:
- Deterministic vulnerability detection via fingerprint matching
- Backport verification through pre/post patch comparison
- Audit trail for security claims with content-addressed provenance
YAML Schema
Golden sets are stored as human-readable YAML files for git-friendliness and easy review.
Full Example
# GoldenSet.yaml schema v1.0.0
id: "CVE-2024-0727"
component: "openssl"
targets:
- function: "PKCS12_parse"
edges:
- "bb3->bb7"
- "bb7->bb9"
sinks:
- "memcpy"
- "OPENSSL_malloc"
constants:
- "0x400"
- "0xdeadbeef"
taint_invariant: "len(field) <= 0x400 required before memcpy"
source_file: "crypto/pkcs12/p12_kiss.c"
source_line: 142
- function: "PKCS12_unpack_p7data"
edges:
- "bb1->bb3"
sinks:
- "d2i_ASN1_OCTET_STRING"
witness:
arguments:
- "--file"
- "<fuzz.bin>"
invariant: "Malformed PKCS12 with oversized authsafe"
poc_file_ref: "sha256:abc123def456abc123def456abc123def456abc123def456abc123def456abc123"
metadata:
author_id: "security-team@example.com"
created_at: "2025-01-10T12:00:00Z"
source_ref: "https://nvd.nist.gov/vuln/detail/CVE-2024-0727"
reviewed_by: "senior-analyst@example.com"
reviewed_at: "2025-01-11T09:00:00Z"
tags:
- "memory-corruption"
- "heap-overflow"
- "pkcs12"
schema_version: "1.0.0"
Minimal Example
id: "CVE-2024-0727"
component: "openssl"
targets:
- function: "vulnerable_function"
metadata:
author_id: "analyst@example.com"
created_at: "2025-01-10T12:00:00Z"
source_ref: "https://nvd.nist.gov/vuln/detail/CVE-2024-0727"
Field Reference
Root Fields
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Vulnerability identifier (CVE-YYYY-NNNN or GHSA-xxxx-xxxx-xxxx) |
component |
string | Yes | Affected component name (e.g., "openssl", "glibc") |
targets |
array | Yes | List of vulnerable code targets (min 1) |
witness |
object | No | Reproduction witness input |
metadata |
object | Yes | Authorship and review metadata |
Vulnerable Target Fields
| Field | Type | Required | Description |
|---|---|---|---|
function |
string | Yes | Function name (symbol or demangled) |
edges |
array | No | Basic block edges (format: "bbN->bbM") |
sinks |
array | No | Sink functions reached (e.g., "memcpy") |
constants |
array | No | Magic values identifying the vulnerability |
taint_invariant |
string | No | Human-readable exploitation invariant |
source_file |
string | No | Source file hint |
source_line |
integer | No | Source line hint |
Witness Input Fields
| Field | Type | Required | Description |
|---|---|---|---|
arguments |
array | No | Command-line arguments to trigger vulnerability |
invariant |
string | No | Human-readable precondition |
poc_file_ref |
string | No | Content-addressed PoC file reference |
Metadata Fields
| Field | Type | Required | Description |
|---|---|---|---|
author_id |
string | Yes | Author identifier (email or handle) |
created_at |
string | Yes | Creation timestamp (ISO 8601 UTC) |
source_ref |
string | Yes | Advisory URL or commit hash |
reviewed_by |
string | No | Reviewer identifier |
reviewed_at |
string | No | Review timestamp (ISO 8601 UTC) |
tags |
array | No | Classification tags |
schema_version |
string | No | Schema version (default: "1.0.0") |
JSON Schema
The following JSON Schema can be used for validation:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://stella-ops.org/schemas/golden-set/v1.0.0",
"title": "Golden Set Definition",
"type": "object",
"required": ["id", "component", "targets", "metadata"],
"properties": {
"id": {
"type": "string",
"pattern": "^CVE-\\d{4}-\\d{4,}$|^GHSA-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}$",
"description": "Vulnerability identifier"
},
"component": {
"type": "string",
"minLength": 1,
"description": "Affected component name"
},
"targets": {
"type": "array",
"minItems": 1,
"items": { "$ref": "#/$defs/vulnerableTarget" },
"description": "Vulnerable code targets"
},
"witness": {
"$ref": "#/$defs/witnessInput",
"description": "Reproduction witness input"
},
"metadata": {
"$ref": "#/$defs/metadata",
"description": "Authorship and review metadata"
}
},
"$defs": {
"vulnerableTarget": {
"type": "object",
"required": ["function"],
"properties": {
"function": {
"type": "string",
"minLength": 1,
"description": "Function name"
},
"edges": {
"type": "array",
"items": {
"type": "string",
"pattern": "^bb\\d+->bb\\d+$"
},
"description": "Basic block edges"
},
"sinks": {
"type": "array",
"items": { "type": "string" },
"description": "Sink functions"
},
"constants": {
"type": "array",
"items": { "type": "string" },
"description": "Magic values"
},
"taint_invariant": {
"type": "string",
"description": "Exploitation invariant"
},
"source_file": {
"type": "string",
"description": "Source file hint"
},
"source_line": {
"type": "integer",
"minimum": 1,
"description": "Source line hint"
}
}
},
"witnessInput": {
"type": "object",
"properties": {
"arguments": {
"type": "array",
"items": { "type": "string" },
"description": "Command-line arguments"
},
"invariant": {
"type": "string",
"description": "Human-readable precondition"
},
"poc_file_ref": {
"type": "string",
"pattern": "^sha256:[a-f0-9]{64}$",
"description": "Content-addressed PoC reference"
}
}
},
"metadata": {
"type": "object",
"required": ["author_id", "created_at", "source_ref"],
"properties": {
"author_id": {
"type": "string",
"description": "Author identifier"
},
"created_at": {
"type": "string",
"format": "date-time",
"description": "Creation timestamp (ISO 8601)"
},
"source_ref": {
"type": "string",
"format": "uri",
"description": "Advisory URL or commit hash"
},
"reviewed_by": {
"type": "string",
"description": "Reviewer identifier"
},
"reviewed_at": {
"type": "string",
"format": "date-time",
"description": "Review timestamp (ISO 8601)"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"description": "Classification tags"
},
"schema_version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Schema version"
}
}
}
}
}
Edge Format
Basic block edges follow the format bbN->bbM where:
bbNis the source basic block identifierbbMis the target basic block identifier- The
->separator indicates control flow direction
Examples:
bb3->bb7- Control flows from block 3 to block 7bb7->bb9- Control flows from block 7 to block 9bb1->bb3- Control flows from block 1 to block 3
Edge identifiers match common disassembler output (IDA, Ghidra, Binary Ninja).
Sink Registry
Known sinks are validated against the sink registry. Categories include:
| Category | Examples | CWEs |
|---|---|---|
memory |
memcpy, strcpy, free, malloc | CWE-120, CWE-787, CWE-415, CWE-416 |
command_injection |
system, exec, popen | CWE-78 |
code_injection |
dlopen, LoadLibrary | CWE-427 |
path_traversal |
fopen, open | CWE-22 |
network |
connect, send, recv | CWE-918, CWE-319 |
sql_injection |
sqlite3_exec, mysql_query | CWE-89 |
crypto |
EVP_DecryptUpdate, PKCS12_parse | CWE-327, CWE-295 |
Unknown sinks generate validation warnings but do not block acceptance.
Content Addressing
Golden sets are content-addressed using SHA256:
- Definition is serialized to canonical JSON (sorted keys, no whitespace)
- SHA256 hash is computed over UTF-8 bytes
- Digest is formatted as
sha256:<64-hex-chars>
Example: sha256:a1b2c3d4e5f6...
Content addressing enables:
- Deduplication in storage
- Audit trail verification
- Immutable references in attestations
Status Workflow
Golden sets progress through these statuses:
Draft → InReview → Approved
↓
Draft (if changes requested)
Approved → Deprecated (if CVE retracted)
→ Archived (for historical reference)
| Status | Description |
|---|---|
Draft |
Initial creation, editable |
InReview |
Submitted for review |
Approved |
Active in corpus, used for detection |
Deprecated |
CVE retracted or superseded |
Archived |
Historical reference only |
Best Practices
Authoring Golden Sets
- Start minimal - Begin with function name only, add edges/sinks as verified
- Use authoritative sources - NVD, vendor advisories, upstream commits
- Document invariants - Explain exploitation conditions in human-readable text
- Tag appropriately - Use consistent classification tags
- Review carefully - Treat golden sets like unit tests
Edge Selection
- Focus on vulnerable paths - Only include edges on the exploitation path
- Avoid over-specification - Fewer edges = more robust matching
- Document rationale - Explain why specific edges are included
Sink Selection
- Use known sinks - Prefer sinks from the registry
- Include all relevant sinks - List all sinks on the vulnerable path
- Order consistently - Alphabetical ordering aids diffing
API Reference
See StellaOps.BinaryIndex.GoldenSet for:
GoldenSetDefinition- Domain modelIGoldenSetValidator- Validation serviceIGoldenSetStore- Storage interfaceGoldenSetYamlSerializer- YAML serializationISinkRegistry- Sink lookup service