Complete batch 012 (golden set diff) and 013 (advisory chat), fix build errors
Sprints completed: - SPRINT_20260110_012_* (golden set diff layer - 10 sprints) - SPRINT_20260110_013_* (advisory chat - 4 sprints) Build fixes applied: - Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create - Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite) - Fix VexSchemaValidationTests FluentAssertions method name - Fix FixChainGateIntegrationTests ambiguous type references - Fix AdvisoryAI test files required properties and namespace aliases - Add stub types for CveMappingController (ICveSymbolMappingService) - Fix VerdictBuilderService static context issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
368
docs/modules/binary-index/golden-set-schema.md
Normal file
368
docs/modules/binary-index/golden-set-schema.md
Normal file
@@ -0,0 +1,368 @@
|
||||
# Golden Set Schema Documentation
|
||||
|
||||
> **Version:** 1.0.0
|
||||
> **Module:** BinaryIndex.GoldenSet
|
||||
> **Last Updated:** 2026-01-10
|
||||
|
||||
## Overview
|
||||
|
||||
Golden Sets are ground-truth definitions of vulnerability code-level manifestations. They capture the specific functions, basic block edges, sinks, and constants that characterize a vulnerability, enabling:
|
||||
|
||||
- **Deterministic vulnerability detection** via fingerprint matching
|
||||
- **Backport verification** through pre/post patch comparison
|
||||
- **Audit trail** for security claims with content-addressed provenance
|
||||
|
||||
## YAML Schema
|
||||
|
||||
Golden sets are stored as human-readable YAML files for git-friendliness and easy review.
|
||||
|
||||
### Full Example
|
||||
|
||||
```yaml
|
||||
# GoldenSet.yaml schema v1.0.0
|
||||
id: "CVE-2024-0727"
|
||||
component: "openssl"
|
||||
|
||||
targets:
|
||||
- function: "PKCS12_parse"
|
||||
edges:
|
||||
- "bb3->bb7"
|
||||
- "bb7->bb9"
|
||||
sinks:
|
||||
- "memcpy"
|
||||
- "OPENSSL_malloc"
|
||||
constants:
|
||||
- "0x400"
|
||||
- "0xdeadbeef"
|
||||
taint_invariant: "len(field) <= 0x400 required before memcpy"
|
||||
source_file: "crypto/pkcs12/p12_kiss.c"
|
||||
source_line: 142
|
||||
|
||||
- function: "PKCS12_unpack_p7data"
|
||||
edges:
|
||||
- "bb1->bb3"
|
||||
sinks:
|
||||
- "d2i_ASN1_OCTET_STRING"
|
||||
|
||||
witness:
|
||||
arguments:
|
||||
- "--file"
|
||||
- "<fuzz.bin>"
|
||||
invariant: "Malformed PKCS12 with oversized authsafe"
|
||||
poc_file_ref: "sha256:abc123def456abc123def456abc123def456abc123def456abc123def456abc123"
|
||||
|
||||
metadata:
|
||||
author_id: "security-team@example.com"
|
||||
created_at: "2025-01-10T12:00:00Z"
|
||||
source_ref: "https://nvd.nist.gov/vuln/detail/CVE-2024-0727"
|
||||
reviewed_by: "senior-analyst@example.com"
|
||||
reviewed_at: "2025-01-11T09:00:00Z"
|
||||
tags:
|
||||
- "memory-corruption"
|
||||
- "heap-overflow"
|
||||
- "pkcs12"
|
||||
schema_version: "1.0.0"
|
||||
```
|
||||
|
||||
### Minimal Example
|
||||
|
||||
```yaml
|
||||
id: "CVE-2024-0727"
|
||||
component: "openssl"
|
||||
targets:
|
||||
- function: "vulnerable_function"
|
||||
metadata:
|
||||
author_id: "analyst@example.com"
|
||||
created_at: "2025-01-10T12:00:00Z"
|
||||
source_ref: "https://nvd.nist.gov/vuln/detail/CVE-2024-0727"
|
||||
```
|
||||
|
||||
## Field Reference
|
||||
|
||||
### Root Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `id` | string | Yes | Vulnerability identifier (CVE-YYYY-NNNN or GHSA-xxxx-xxxx-xxxx) |
|
||||
| `component` | string | Yes | Affected component name (e.g., "openssl", "glibc") |
|
||||
| `targets` | array | Yes | List of vulnerable code targets (min 1) |
|
||||
| `witness` | object | No | Reproduction witness input |
|
||||
| `metadata` | object | Yes | Authorship and review metadata |
|
||||
|
||||
### Vulnerable Target Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `function` | string | Yes | Function name (symbol or demangled) |
|
||||
| `edges` | array | No | Basic block edges (format: "bbN->bbM") |
|
||||
| `sinks` | array | No | Sink functions reached (e.g., "memcpy") |
|
||||
| `constants` | array | No | Magic values identifying the vulnerability |
|
||||
| `taint_invariant` | string | No | Human-readable exploitation invariant |
|
||||
| `source_file` | string | No | Source file hint |
|
||||
| `source_line` | integer | No | Source line hint |
|
||||
|
||||
### Witness Input Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `arguments` | array | No | Command-line arguments to trigger vulnerability |
|
||||
| `invariant` | string | No | Human-readable precondition |
|
||||
| `poc_file_ref` | string | No | Content-addressed PoC file reference |
|
||||
|
||||
### Metadata Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `author_id` | string | Yes | Author identifier (email or handle) |
|
||||
| `created_at` | string | Yes | Creation timestamp (ISO 8601 UTC) |
|
||||
| `source_ref` | string | Yes | Advisory URL or commit hash |
|
||||
| `reviewed_by` | string | No | Reviewer identifier |
|
||||
| `reviewed_at` | string | No | Review timestamp (ISO 8601 UTC) |
|
||||
| `tags` | array | No | Classification tags |
|
||||
| `schema_version` | string | No | Schema version (default: "1.0.0") |
|
||||
|
||||
## JSON Schema
|
||||
|
||||
The following JSON Schema can be used for validation:
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$id": "https://stella-ops.org/schemas/golden-set/v1.0.0",
|
||||
"title": "Golden Set Definition",
|
||||
"type": "object",
|
||||
"required": ["id", "component", "targets", "metadata"],
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string",
|
||||
"pattern": "^CVE-\\d{4}-\\d{4,}$|^GHSA-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}$",
|
||||
"description": "Vulnerability identifier"
|
||||
},
|
||||
"component": {
|
||||
"type": "string",
|
||||
"minLength": 1,
|
||||
"description": "Affected component name"
|
||||
},
|
||||
"targets": {
|
||||
"type": "array",
|
||||
"minItems": 1,
|
||||
"items": { "$ref": "#/$defs/vulnerableTarget" },
|
||||
"description": "Vulnerable code targets"
|
||||
},
|
||||
"witness": {
|
||||
"$ref": "#/$defs/witnessInput",
|
||||
"description": "Reproduction witness input"
|
||||
},
|
||||
"metadata": {
|
||||
"$ref": "#/$defs/metadata",
|
||||
"description": "Authorship and review metadata"
|
||||
}
|
||||
},
|
||||
"$defs": {
|
||||
"vulnerableTarget": {
|
||||
"type": "object",
|
||||
"required": ["function"],
|
||||
"properties": {
|
||||
"function": {
|
||||
"type": "string",
|
||||
"minLength": 1,
|
||||
"description": "Function name"
|
||||
},
|
||||
"edges": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"pattern": "^bb\\d+->bb\\d+$"
|
||||
},
|
||||
"description": "Basic block edges"
|
||||
},
|
||||
"sinks": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Sink functions"
|
||||
},
|
||||
"constants": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Magic values"
|
||||
},
|
||||
"taint_invariant": {
|
||||
"type": "string",
|
||||
"description": "Exploitation invariant"
|
||||
},
|
||||
"source_file": {
|
||||
"type": "string",
|
||||
"description": "Source file hint"
|
||||
},
|
||||
"source_line": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"description": "Source line hint"
|
||||
}
|
||||
}
|
||||
},
|
||||
"witnessInput": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"arguments": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Command-line arguments"
|
||||
},
|
||||
"invariant": {
|
||||
"type": "string",
|
||||
"description": "Human-readable precondition"
|
||||
},
|
||||
"poc_file_ref": {
|
||||
"type": "string",
|
||||
"pattern": "^sha256:[a-f0-9]{64}$",
|
||||
"description": "Content-addressed PoC reference"
|
||||
}
|
||||
}
|
||||
},
|
||||
"metadata": {
|
||||
"type": "object",
|
||||
"required": ["author_id", "created_at", "source_ref"],
|
||||
"properties": {
|
||||
"author_id": {
|
||||
"type": "string",
|
||||
"description": "Author identifier"
|
||||
},
|
||||
"created_at": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Creation timestamp (ISO 8601)"
|
||||
},
|
||||
"source_ref": {
|
||||
"type": "string",
|
||||
"format": "uri",
|
||||
"description": "Advisory URL or commit hash"
|
||||
},
|
||||
"reviewed_by": {
|
||||
"type": "string",
|
||||
"description": "Reviewer identifier"
|
||||
},
|
||||
"reviewed_at": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Review timestamp (ISO 8601)"
|
||||
},
|
||||
"tags": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Classification tags"
|
||||
},
|
||||
"schema_version": {
|
||||
"type": "string",
|
||||
"pattern": "^\\d+\\.\\d+\\.\\d+$",
|
||||
"description": "Schema version"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Edge Format
|
||||
|
||||
Basic block edges follow the format `bbN->bbM` where:
|
||||
- `bbN` is the source basic block identifier
|
||||
- `bbM` is the target basic block identifier
|
||||
- The `->` separator indicates control flow direction
|
||||
|
||||
Examples:
|
||||
- `bb3->bb7` - Control flows from block 3 to block 7
|
||||
- `bb7->bb9` - Control flows from block 7 to block 9
|
||||
- `bb1->bb3` - Control flows from block 1 to block 3
|
||||
|
||||
Edge identifiers match common disassembler output (IDA, Ghidra, Binary Ninja).
|
||||
|
||||
## Sink Registry
|
||||
|
||||
Known sinks are validated against the sink registry. Categories include:
|
||||
|
||||
| Category | Examples | CWEs |
|
||||
|----------|----------|------|
|
||||
| `memory` | memcpy, strcpy, free, malloc | CWE-120, CWE-787, CWE-415, CWE-416 |
|
||||
| `command_injection` | system, exec, popen | CWE-78 |
|
||||
| `code_injection` | dlopen, LoadLibrary | CWE-427 |
|
||||
| `path_traversal` | fopen, open | CWE-22 |
|
||||
| `network` | connect, send, recv | CWE-918, CWE-319 |
|
||||
| `sql_injection` | sqlite3_exec, mysql_query | CWE-89 |
|
||||
| `crypto` | EVP_DecryptUpdate, PKCS12_parse | CWE-327, CWE-295 |
|
||||
|
||||
Unknown sinks generate validation warnings but do not block acceptance.
|
||||
|
||||
## Content Addressing
|
||||
|
||||
Golden sets are content-addressed using SHA256:
|
||||
|
||||
1. Definition is serialized to canonical JSON (sorted keys, no whitespace)
|
||||
2. SHA256 hash is computed over UTF-8 bytes
|
||||
3. Digest is formatted as `sha256:<64-hex-chars>`
|
||||
|
||||
Example: `sha256:a1b2c3d4e5f6...`
|
||||
|
||||
Content addressing enables:
|
||||
- Deduplication in storage
|
||||
- Audit trail verification
|
||||
- Immutable references in attestations
|
||||
|
||||
## Status Workflow
|
||||
|
||||
Golden sets progress through these statuses:
|
||||
|
||||
```
|
||||
Draft → InReview → Approved
|
||||
↓
|
||||
Draft (if changes requested)
|
||||
|
||||
Approved → Deprecated (if CVE retracted)
|
||||
→ Archived (for historical reference)
|
||||
```
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `Draft` | Initial creation, editable |
|
||||
| `InReview` | Submitted for review |
|
||||
| `Approved` | Active in corpus, used for detection |
|
||||
| `Deprecated` | CVE retracted or superseded |
|
||||
| `Archived` | Historical reference only |
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Authoring Golden Sets
|
||||
|
||||
1. **Start minimal** - Begin with function name only, add edges/sinks as verified
|
||||
2. **Use authoritative sources** - NVD, vendor advisories, upstream commits
|
||||
3. **Document invariants** - Explain exploitation conditions in human-readable text
|
||||
4. **Tag appropriately** - Use consistent classification tags
|
||||
5. **Review carefully** - Treat golden sets like unit tests
|
||||
|
||||
### Edge Selection
|
||||
|
||||
1. **Focus on vulnerable paths** - Only include edges on the exploitation path
|
||||
2. **Avoid over-specification** - Fewer edges = more robust matching
|
||||
3. **Document rationale** - Explain why specific edges are included
|
||||
|
||||
### Sink Selection
|
||||
|
||||
1. **Use known sinks** - Prefer sinks from the registry
|
||||
2. **Include all relevant sinks** - List all sinks on the vulnerable path
|
||||
3. **Order consistently** - Alphabetical ordering aids diffing
|
||||
|
||||
## API Reference
|
||||
|
||||
See [StellaOps.BinaryIndex.GoldenSet](../../../src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/) for:
|
||||
|
||||
- `GoldenSetDefinition` - Domain model
|
||||
- `IGoldenSetValidator` - Validation service
|
||||
- `IGoldenSetStore` - Storage interface
|
||||
- `GoldenSetYamlSerializer` - YAML serialization
|
||||
- `ISinkRegistry` - Sink lookup service
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [BinaryIndex Architecture](architecture.md)
|
||||
- [Delta Signature Matching](delta-signatures.md)
|
||||
- [VEX Evidence Generation](../vex-lens/architecture.md)
|
||||
Reference in New Issue
Block a user