Sprints completed: - SPRINT_20260110_012_* (golden set diff layer - 10 sprints) - SPRINT_20260110_013_* (advisory chat - 4 sprints) Build fixes applied: - Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create - Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite) - Fix VexSchemaValidationTests FluentAssertions method name - Fix FixChainGateIntegrationTests ambiguous type references - Fix AdvisoryAI test files required properties and namespace aliases - Add stub types for CveMappingController (ICveSymbolMappingService) - Fix VerdictBuilderService static context issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
369 lines
11 KiB
Markdown
369 lines
11 KiB
Markdown
# Golden Set Schema Documentation
|
|
|
|
> **Version:** 1.0.0
|
|
> **Module:** BinaryIndex.GoldenSet
|
|
> **Last Updated:** 2026-01-10
|
|
|
|
## Overview
|
|
|
|
Golden Sets are ground-truth definitions of vulnerability code-level manifestations. They capture the specific functions, basic block edges, sinks, and constants that characterize a vulnerability, enabling:
|
|
|
|
- **Deterministic vulnerability detection** via fingerprint matching
|
|
- **Backport verification** through pre/post patch comparison
|
|
- **Audit trail** for security claims with content-addressed provenance
|
|
|
|
## YAML Schema
|
|
|
|
Golden sets are stored as human-readable YAML files for git-friendliness and easy review.
|
|
|
|
### Full Example
|
|
|
|
```yaml
|
|
# GoldenSet.yaml schema v1.0.0
|
|
id: "CVE-2024-0727"
|
|
component: "openssl"
|
|
|
|
targets:
|
|
- function: "PKCS12_parse"
|
|
edges:
|
|
- "bb3->bb7"
|
|
- "bb7->bb9"
|
|
sinks:
|
|
- "memcpy"
|
|
- "OPENSSL_malloc"
|
|
constants:
|
|
- "0x400"
|
|
- "0xdeadbeef"
|
|
taint_invariant: "len(field) <= 0x400 required before memcpy"
|
|
source_file: "crypto/pkcs12/p12_kiss.c"
|
|
source_line: 142
|
|
|
|
- function: "PKCS12_unpack_p7data"
|
|
edges:
|
|
- "bb1->bb3"
|
|
sinks:
|
|
- "d2i_ASN1_OCTET_STRING"
|
|
|
|
witness:
|
|
arguments:
|
|
- "--file"
|
|
- "<fuzz.bin>"
|
|
invariant: "Malformed PKCS12 with oversized authsafe"
|
|
poc_file_ref: "sha256:abc123def456abc123def456abc123def456abc123def456abc123def456abc123"
|
|
|
|
metadata:
|
|
author_id: "security-team@example.com"
|
|
created_at: "2025-01-10T12:00:00Z"
|
|
source_ref: "https://nvd.nist.gov/vuln/detail/CVE-2024-0727"
|
|
reviewed_by: "senior-analyst@example.com"
|
|
reviewed_at: "2025-01-11T09:00:00Z"
|
|
tags:
|
|
- "memory-corruption"
|
|
- "heap-overflow"
|
|
- "pkcs12"
|
|
schema_version: "1.0.0"
|
|
```
|
|
|
|
### Minimal Example
|
|
|
|
```yaml
|
|
id: "CVE-2024-0727"
|
|
component: "openssl"
|
|
targets:
|
|
- function: "vulnerable_function"
|
|
metadata:
|
|
author_id: "analyst@example.com"
|
|
created_at: "2025-01-10T12:00:00Z"
|
|
source_ref: "https://nvd.nist.gov/vuln/detail/CVE-2024-0727"
|
|
```
|
|
|
|
## Field Reference
|
|
|
|
### Root Fields
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `id` | string | Yes | Vulnerability identifier (CVE-YYYY-NNNN or GHSA-xxxx-xxxx-xxxx) |
|
|
| `component` | string | Yes | Affected component name (e.g., "openssl", "glibc") |
|
|
| `targets` | array | Yes | List of vulnerable code targets (min 1) |
|
|
| `witness` | object | No | Reproduction witness input |
|
|
| `metadata` | object | Yes | Authorship and review metadata |
|
|
|
|
### Vulnerable Target Fields
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `function` | string | Yes | Function name (symbol or demangled) |
|
|
| `edges` | array | No | Basic block edges (format: "bbN->bbM") |
|
|
| `sinks` | array | No | Sink functions reached (e.g., "memcpy") |
|
|
| `constants` | array | No | Magic values identifying the vulnerability |
|
|
| `taint_invariant` | string | No | Human-readable exploitation invariant |
|
|
| `source_file` | string | No | Source file hint |
|
|
| `source_line` | integer | No | Source line hint |
|
|
|
|
### Witness Input Fields
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `arguments` | array | No | Command-line arguments to trigger vulnerability |
|
|
| `invariant` | string | No | Human-readable precondition |
|
|
| `poc_file_ref` | string | No | Content-addressed PoC file reference |
|
|
|
|
### Metadata Fields
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `author_id` | string | Yes | Author identifier (email or handle) |
|
|
| `created_at` | string | Yes | Creation timestamp (ISO 8601 UTC) |
|
|
| `source_ref` | string | Yes | Advisory URL or commit hash |
|
|
| `reviewed_by` | string | No | Reviewer identifier |
|
|
| `reviewed_at` | string | No | Review timestamp (ISO 8601 UTC) |
|
|
| `tags` | array | No | Classification tags |
|
|
| `schema_version` | string | No | Schema version (default: "1.0.0") |
|
|
|
|
## JSON Schema
|
|
|
|
The following JSON Schema can be used for validation:
|
|
|
|
```json
|
|
{
|
|
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
|
"$id": "https://stella-ops.org/schemas/golden-set/v1.0.0",
|
|
"title": "Golden Set Definition",
|
|
"type": "object",
|
|
"required": ["id", "component", "targets", "metadata"],
|
|
"properties": {
|
|
"id": {
|
|
"type": "string",
|
|
"pattern": "^CVE-\\d{4}-\\d{4,}$|^GHSA-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}$",
|
|
"description": "Vulnerability identifier"
|
|
},
|
|
"component": {
|
|
"type": "string",
|
|
"minLength": 1,
|
|
"description": "Affected component name"
|
|
},
|
|
"targets": {
|
|
"type": "array",
|
|
"minItems": 1,
|
|
"items": { "$ref": "#/$defs/vulnerableTarget" },
|
|
"description": "Vulnerable code targets"
|
|
},
|
|
"witness": {
|
|
"$ref": "#/$defs/witnessInput",
|
|
"description": "Reproduction witness input"
|
|
},
|
|
"metadata": {
|
|
"$ref": "#/$defs/metadata",
|
|
"description": "Authorship and review metadata"
|
|
}
|
|
},
|
|
"$defs": {
|
|
"vulnerableTarget": {
|
|
"type": "object",
|
|
"required": ["function"],
|
|
"properties": {
|
|
"function": {
|
|
"type": "string",
|
|
"minLength": 1,
|
|
"description": "Function name"
|
|
},
|
|
"edges": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "string",
|
|
"pattern": "^bb\\d+->bb\\d+$"
|
|
},
|
|
"description": "Basic block edges"
|
|
},
|
|
"sinks": {
|
|
"type": "array",
|
|
"items": { "type": "string" },
|
|
"description": "Sink functions"
|
|
},
|
|
"constants": {
|
|
"type": "array",
|
|
"items": { "type": "string" },
|
|
"description": "Magic values"
|
|
},
|
|
"taint_invariant": {
|
|
"type": "string",
|
|
"description": "Exploitation invariant"
|
|
},
|
|
"source_file": {
|
|
"type": "string",
|
|
"description": "Source file hint"
|
|
},
|
|
"source_line": {
|
|
"type": "integer",
|
|
"minimum": 1,
|
|
"description": "Source line hint"
|
|
}
|
|
}
|
|
},
|
|
"witnessInput": {
|
|
"type": "object",
|
|
"properties": {
|
|
"arguments": {
|
|
"type": "array",
|
|
"items": { "type": "string" },
|
|
"description": "Command-line arguments"
|
|
},
|
|
"invariant": {
|
|
"type": "string",
|
|
"description": "Human-readable precondition"
|
|
},
|
|
"poc_file_ref": {
|
|
"type": "string",
|
|
"pattern": "^sha256:[a-f0-9]{64}$",
|
|
"description": "Content-addressed PoC reference"
|
|
}
|
|
}
|
|
},
|
|
"metadata": {
|
|
"type": "object",
|
|
"required": ["author_id", "created_at", "source_ref"],
|
|
"properties": {
|
|
"author_id": {
|
|
"type": "string",
|
|
"description": "Author identifier"
|
|
},
|
|
"created_at": {
|
|
"type": "string",
|
|
"format": "date-time",
|
|
"description": "Creation timestamp (ISO 8601)"
|
|
},
|
|
"source_ref": {
|
|
"type": "string",
|
|
"format": "uri",
|
|
"description": "Advisory URL or commit hash"
|
|
},
|
|
"reviewed_by": {
|
|
"type": "string",
|
|
"description": "Reviewer identifier"
|
|
},
|
|
"reviewed_at": {
|
|
"type": "string",
|
|
"format": "date-time",
|
|
"description": "Review timestamp (ISO 8601)"
|
|
},
|
|
"tags": {
|
|
"type": "array",
|
|
"items": { "type": "string" },
|
|
"description": "Classification tags"
|
|
},
|
|
"schema_version": {
|
|
"type": "string",
|
|
"pattern": "^\\d+\\.\\d+\\.\\d+$",
|
|
"description": "Schema version"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Edge Format
|
|
|
|
Basic block edges follow the format `bbN->bbM` where:
|
|
- `bbN` is the source basic block identifier
|
|
- `bbM` is the target basic block identifier
|
|
- The `->` separator indicates control flow direction
|
|
|
|
Examples:
|
|
- `bb3->bb7` - Control flows from block 3 to block 7
|
|
- `bb7->bb9` - Control flows from block 7 to block 9
|
|
- `bb1->bb3` - Control flows from block 1 to block 3
|
|
|
|
Edge identifiers match common disassembler output (IDA, Ghidra, Binary Ninja).
|
|
|
|
## Sink Registry
|
|
|
|
Known sinks are validated against the sink registry. Categories include:
|
|
|
|
| Category | Examples | CWEs |
|
|
|----------|----------|------|
|
|
| `memory` | memcpy, strcpy, free, malloc | CWE-120, CWE-787, CWE-415, CWE-416 |
|
|
| `command_injection` | system, exec, popen | CWE-78 |
|
|
| `code_injection` | dlopen, LoadLibrary | CWE-427 |
|
|
| `path_traversal` | fopen, open | CWE-22 |
|
|
| `network` | connect, send, recv | CWE-918, CWE-319 |
|
|
| `sql_injection` | sqlite3_exec, mysql_query | CWE-89 |
|
|
| `crypto` | EVP_DecryptUpdate, PKCS12_parse | CWE-327, CWE-295 |
|
|
|
|
Unknown sinks generate validation warnings but do not block acceptance.
|
|
|
|
## Content Addressing
|
|
|
|
Golden sets are content-addressed using SHA256:
|
|
|
|
1. Definition is serialized to canonical JSON (sorted keys, no whitespace)
|
|
2. SHA256 hash is computed over UTF-8 bytes
|
|
3. Digest is formatted as `sha256:<64-hex-chars>`
|
|
|
|
Example: `sha256:a1b2c3d4e5f6...`
|
|
|
|
Content addressing enables:
|
|
- Deduplication in storage
|
|
- Audit trail verification
|
|
- Immutable references in attestations
|
|
|
|
## Status Workflow
|
|
|
|
Golden sets progress through these statuses:
|
|
|
|
```
|
|
Draft → InReview → Approved
|
|
↓
|
|
Draft (if changes requested)
|
|
|
|
Approved → Deprecated (if CVE retracted)
|
|
→ Archived (for historical reference)
|
|
```
|
|
|
|
| Status | Description |
|
|
|--------|-------------|
|
|
| `Draft` | Initial creation, editable |
|
|
| `InReview` | Submitted for review |
|
|
| `Approved` | Active in corpus, used for detection |
|
|
| `Deprecated` | CVE retracted or superseded |
|
|
| `Archived` | Historical reference only |
|
|
|
|
## Best Practices
|
|
|
|
### Authoring Golden Sets
|
|
|
|
1. **Start minimal** - Begin with function name only, add edges/sinks as verified
|
|
2. **Use authoritative sources** - NVD, vendor advisories, upstream commits
|
|
3. **Document invariants** - Explain exploitation conditions in human-readable text
|
|
4. **Tag appropriately** - Use consistent classification tags
|
|
5. **Review carefully** - Treat golden sets like unit tests
|
|
|
|
### Edge Selection
|
|
|
|
1. **Focus on vulnerable paths** - Only include edges on the exploitation path
|
|
2. **Avoid over-specification** - Fewer edges = more robust matching
|
|
3. **Document rationale** - Explain why specific edges are included
|
|
|
|
### Sink Selection
|
|
|
|
1. **Use known sinks** - Prefer sinks from the registry
|
|
2. **Include all relevant sinks** - List all sinks on the vulnerable path
|
|
3. **Order consistently** - Alphabetical ordering aids diffing
|
|
|
|
## API Reference
|
|
|
|
See [StellaOps.BinaryIndex.GoldenSet](../../../src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/) for:
|
|
|
|
- `GoldenSetDefinition` - Domain model
|
|
- `IGoldenSetValidator` - Validation service
|
|
- `IGoldenSetStore` - Storage interface
|
|
- `GoldenSetYamlSerializer` - YAML serialization
|
|
- `ISinkRegistry` - Sink lookup service
|
|
|
|
## Related Documentation
|
|
|
|
- [BinaryIndex Architecture](architecture.md)
|
|
- [Delta Signature Matching](delta-signatures.md)
|
|
- [VEX Evidence Generation](../vex-lens/architecture.md)
|