Complete batch 012 (golden set diff) and 013 (advisory chat), fix build errors
Sprints completed: - SPRINT_20260110_012_* (golden set diff layer - 10 sprints) - SPRINT_20260110_013_* (advisory chat - 4 sprints) Build fixes applied: - Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create - Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite) - Fix VexSchemaValidationTests FluentAssertions method name - Fix FixChainGateIntegrationTests ambiguous type references - Fix AdvisoryAI test files required properties and namespace aliases - Add stub types for CveMappingController (ICveSymbolMappingService) - Fix VerdictBuilderService static context issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
311
docs/modules/scanner/golden-set-authoring.md
Normal file
311
docs/modules/scanner/golden-set-authoring.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# Golden Set Authoring Guide
|
||||
|
||||
This document describes the authoring workflow for creating and curating Golden Sets - ground-truth definitions of vulnerability code-level manifestation facts used for binary vulnerability detection.
|
||||
|
||||
## Overview
|
||||
|
||||
Golden Sets are YAML-based definitions that describe:
|
||||
- **Vulnerable functions** - Entry points where vulnerabilities manifest
|
||||
- **Sink functions** - Dangerous API calls that enable exploitation
|
||||
- **Edge patterns** - Control flow patterns indicating vulnerability presence
|
||||
- **Constants** - Magic numbers, buffer sizes, or version markers
|
||||
- **Witness inputs** - Example triggers for the vulnerability
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Golden Set Authoring Pipeline │
|
||||
├──────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌────────────────┐ ┌─────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ CVE/Advisory │───>│ Extractors │───>│ Draft Golden Set │ │
|
||||
│ │ Sources │ │ (NVD/OSV/GHSA) │ │ │ │
|
||||
│ └────────────────┘ └─────────────────┘ └──────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ v v │
|
||||
│ ┌─────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ Upstream Commit │ │ AI Enrichment │ │
|
||||
│ │ Analyzer │───>│ Service │ │
|
||||
│ └─────────────────┘ └──────────────────────┘ │
|
||||
│ │ │
|
||||
│ v │
|
||||
│ ┌─────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ Validator │<───│ Review Workflow │ │
|
||||
│ └─────────────────┘ └──────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ v v │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ PostgreSQL Storage │ │
|
||||
│ │ (content-addressed, versioned) │ │
|
||||
│ └─────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Extractors
|
||||
|
||||
Extractors pull vulnerability data from advisory sources:
|
||||
|
||||
```csharp
|
||||
// Extract from NVD/OSV/GHSA
|
||||
var extractor = serviceProvider.GetRequiredService<IGoldenSetExtractor>();
|
||||
|
||||
var result = await extractor.ExtractAsync(
|
||||
"CVE-2024-1234",
|
||||
"openssl",
|
||||
new ExtractionOptions
|
||||
{
|
||||
UseAiEnrichment = true,
|
||||
IncludeUpstreamCommits = true,
|
||||
IncludeRelatedCves = true
|
||||
});
|
||||
```
|
||||
|
||||
**Supported Sources:**
|
||||
- **NVD** - National Vulnerability Database
|
||||
- **OSV** - Open Source Vulnerabilities
|
||||
- **GHSA** - GitHub Security Advisories
|
||||
|
||||
### 2. Upstream Commit Analyzer
|
||||
|
||||
Analyzes fix commits to extract:
|
||||
- Modified functions (from hunk headers)
|
||||
- Added constants (hex values, buffer sizes)
|
||||
- Added conditions (bounds checks, NULL checks)
|
||||
|
||||
```csharp
|
||||
var analyzer = serviceProvider.GetRequiredService<IUpstreamCommitAnalyzer>();
|
||||
|
||||
// Parse commit URL
|
||||
var parsed = analyzer.ParseCommitUrl("https://github.com/curl/curl/commit/abc123");
|
||||
|
||||
// Analyze commits
|
||||
var result = await analyzer.AnalyzeAsync([
|
||||
"https://github.com/curl/curl/commit/abc123",
|
||||
"https://github.com/curl/curl/commit/def456"
|
||||
]);
|
||||
|
||||
// Result contains:
|
||||
// - ModifiedFunctions: ["parse_header", "validate_length"]
|
||||
// - AddedConstants: ["0x1000", "sizeof(buffer)"]
|
||||
// - AddedConditions: ["bounds_check", "null_check"]
|
||||
```
|
||||
|
||||
**Supported Platforms:**
|
||||
- GitHub (`github.com/owner/repo/commit/hash`)
|
||||
- GitLab (`gitlab.com/owner/repo/-/commit/hash`)
|
||||
- Bitbucket (`bitbucket.org/owner/repo/commits/hash`)
|
||||
|
||||
### 3. CWE-to-Sink Mapper
|
||||
|
||||
Maps CWE classifications to relevant sink functions:
|
||||
|
||||
```csharp
|
||||
// Get sinks for buffer overflow CWEs
|
||||
var sinks = CweToSinkMapper.GetSinksForCwes(["CWE-120", "CWE-122"]);
|
||||
// Returns: ["memcpy", "strcpy", "sprintf", "gets", ...]
|
||||
|
||||
// Get all mapped CWEs
|
||||
var cwes = CweToSinkMapper.GetMappedCwes();
|
||||
```
|
||||
|
||||
**Supported CWE Categories:**
|
||||
| Category | CWE IDs | Example Sinks |
|
||||
|----------|---------|---------------|
|
||||
| Buffer Overflow | CWE-120, CWE-121, CWE-122, CWE-787 | `memcpy`, `strcpy`, `sprintf` |
|
||||
| Format String | CWE-134 | `printf`, `fprintf`, `sprintf` |
|
||||
| Integer Overflow | CWE-190, CWE-191 | `malloc`, `calloc`, `realloc` |
|
||||
| Use After Free | CWE-416 | `free`, `delete`, `delete[]` |
|
||||
| Command Injection | CWE-78 | `system`, `popen`, `execve` |
|
||||
| SQL Injection | CWE-89 | `PQexec`, `mysql_query`, `sqlite3_exec` |
|
||||
| Path Traversal | CWE-22 | `fopen`, `open`, `access` |
|
||||
| NULL Pointer | CWE-476 | (dereference detection) |
|
||||
|
||||
### 4. AI Enrichment Service
|
||||
|
||||
Optional AI-assisted enrichment using advisory text and commit analysis:
|
||||
|
||||
```csharp
|
||||
var enrichmentService = serviceProvider.GetRequiredService<IGoldenSetEnrichmentService>();
|
||||
|
||||
if (enrichmentService.IsAvailable)
|
||||
{
|
||||
var result = await enrichmentService.EnrichAsync(
|
||||
draftGoldenSet,
|
||||
new GoldenSetEnrichmentContext
|
||||
{
|
||||
CommitAnalysis = commitResult,
|
||||
CweIds = ["CWE-787"],
|
||||
AdvisoryText = "Buffer overflow in parse_header..."
|
||||
});
|
||||
|
||||
// Result.EnrichedDraft contains improved definition
|
||||
// Result.ActionsApplied describes what was added/refined
|
||||
}
|
||||
```
|
||||
|
||||
**Enrichment Actions:**
|
||||
- `function_added` - New vulnerable function identified
|
||||
- `sink_added` - New sink function from CWE mapping
|
||||
- `constant_extracted` - Magic value from commits
|
||||
- `edge_suggested` - Control flow pattern suggested
|
||||
- `witness_hint_added` - Example trigger input
|
||||
|
||||
### 5. Review Workflow
|
||||
|
||||
State machine for golden set curation:
|
||||
|
||||
```
|
||||
Draft ──> InReview ──> Approved ──> Deprecated ──> Archived
|
||||
│ │ │
|
||||
└───────────┴────────────┴── (can return to Draft)
|
||||
```
|
||||
|
||||
```csharp
|
||||
var reviewService = serviceProvider.GetRequiredService<IGoldenSetReviewService>();
|
||||
|
||||
// Submit for review
|
||||
await reviewService.SubmitForReviewAsync("CVE-2024-1234", "author@example.com");
|
||||
|
||||
// Approve
|
||||
await reviewService.ApproveAsync("CVE-2024-1234", "reviewer@example.com", "LGTM");
|
||||
|
||||
// Or request changes
|
||||
await reviewService.RequestChangesAsync(
|
||||
"CVE-2024-1234",
|
||||
"reviewer@example.com",
|
||||
"Needs specific function name",
|
||||
[new ChangeRequest { Field = "targets[0].functionName", Suggestion = "parse_header" }]);
|
||||
```
|
||||
|
||||
## Golden Set Schema
|
||||
|
||||
```yaml
|
||||
# CVE-2024-1234.golden.yaml
|
||||
schema_version: "1.0"
|
||||
id: CVE-2024-1234
|
||||
component: openssl
|
||||
|
||||
targets:
|
||||
- function: parse_header
|
||||
sinks:
|
||||
- memcpy
|
||||
- strcpy
|
||||
constants:
|
||||
- "0x1000"
|
||||
- "sizeof(buffer)"
|
||||
edges:
|
||||
- bb1->bb2 # bounds check bypass
|
||||
|
||||
witness:
|
||||
stdin: "AAAA..."
|
||||
argv:
|
||||
- "--vulnerable-option"
|
||||
env:
|
||||
BUFFER_SIZE: "99999"
|
||||
|
||||
metadata:
|
||||
author_id: researcher@example.com
|
||||
source_ref: https://nvd.nist.gov/vuln/detail/CVE-2024-1234
|
||||
created_at: 2024-01-15T10:30:00Z
|
||||
tags:
|
||||
- memory-corruption
|
||||
- heap-overflow
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# appsettings.yaml
|
||||
BinaryIndex:
|
||||
GoldenSet:
|
||||
SchemaVersion: "1.0"
|
||||
Validation:
|
||||
ValidateCveExists: true
|
||||
ValidateSinks: true
|
||||
StrictEdgeFormat: true
|
||||
OfflineMode: false
|
||||
Storage:
|
||||
PostgresSchema: golden_sets
|
||||
ConnectionStringName: BinaryIndex
|
||||
Caching:
|
||||
SinkRegistryCacheMinutes: 60
|
||||
DefinitionCacheMinutes: 15
|
||||
Authoring:
|
||||
EnableAiEnrichment: true
|
||||
EnableCommitAnalysis: true
|
||||
MaxCommitsToAnalyze: 5
|
||||
AutoAcceptConfidenceThreshold: 0.8
|
||||
```
|
||||
|
||||
## Service Registration
|
||||
|
||||
```csharp
|
||||
// Program.cs or Startup.cs
|
||||
services.AddGoldenSetServices(configuration);
|
||||
services.AddGoldenSetAuthoring();
|
||||
services.AddGoldenSetPostgresStorage();
|
||||
|
||||
// Optional: Add HTTP client for commit analysis
|
||||
services.AddHttpClient("upstream-commits", client =>
|
||||
{
|
||||
client.Timeout = TimeSpan.FromSeconds(30);
|
||||
client.DefaultRequestHeaders.Add("User-Agent", "StellaOps-GoldenSet/1.0");
|
||||
});
|
||||
```
|
||||
|
||||
## CLI Usage
|
||||
|
||||
```bash
|
||||
# Initialize a golden set from CVE
|
||||
stella scanner golden init CVE-2024-1234 --component openssl
|
||||
|
||||
# With options
|
||||
stella scanner golden init CVE-2024-1234 \
|
||||
--component openssl \
|
||||
--output ./golden-sets/CVE-2024-1234.yaml \
|
||||
--no-ai \
|
||||
--store
|
||||
|
||||
# Interactive mode for refinement
|
||||
stella scanner golden init CVE-2024-1234 --interactive
|
||||
|
||||
# Export as JSON
|
||||
stella scanner golden init CVE-2024-1234 --json
|
||||
```
|
||||
|
||||
## Validation Rules
|
||||
|
||||
1. **CVE Format** - Must match `CVE-YYYY-NNNNN` or `GHSA-xxxx-xxxx-xxxx`
|
||||
2. **Component Required** - Non-empty component name
|
||||
3. **Targets Required** - At least one vulnerable target
|
||||
4. **Sinks Validation** - Sinks must be in the sink registry
|
||||
5. **Edge Format** - Must match `bbN->bbM` pattern (if strict mode)
|
||||
6. **Constants Format** - Hex constants must be valid (`0x...`)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start with Commit Analysis** - Fix commits are the most reliable source
|
||||
2. **Use CWE Mapping** - Automatic sink suggestions based on vulnerability type
|
||||
3. **Validate Locally** - Always validate before submitting for review
|
||||
4. **Include Witness Data** - Example inputs help verify detection accuracy
|
||||
5. **Tag Appropriately** - Use consistent tags for categorization
|
||||
6. **Document Source** - Always include source_ref for traceability
|
||||
|
||||
## Metrics
|
||||
|
||||
Track authoring quality with:
|
||||
- **Extraction Confidence** - Overall, per-source, per-field
|
||||
- **Enrichment Actions** - What was added automatically
|
||||
- **Review Iterations** - How many rounds before approval
|
||||
- **Detection Rate** - How well the golden set detects known-vulnerable binaries
|
||||
|
||||
## See Also
|
||||
|
||||
- [Golden Set Schema Reference](../schemas/golden-set-schema.md)
|
||||
- [Sink Registry](../modules/scanner/sink-registry.md)
|
||||
- [Binary Analysis Architecture](../modules/scanner/architecture.md)
|
||||
- [Vulnerability Detection](../modules/scanner/vulnerability-detection.md)
|
||||
Reference in New Issue
Block a user