Complete batch 012 (golden set diff) and 013 (advisory chat), fix build errors

Sprints completed:
- SPRINT_20260110_012_* (golden set diff layer - 10 sprints)
- SPRINT_20260110_013_* (advisory chat - 4 sprints)

Build fixes applied:
- Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create
- Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite)
- Fix VexSchemaValidationTests FluentAssertions method name
- Fix FixChainGateIntegrationTests ambiguous type references
- Fix AdvisoryAI test files required properties and namespace aliases
- Add stub types for CveMappingController (ICveSymbolMappingService)
- Fix VerdictBuilderService static context issue

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
master
2026-01-11 10:09:07 +02:00
parent a3b2f30a11
commit 7f7eb8b228
232 changed files with 58979 additions and 91 deletions

View File

@@ -0,0 +1,311 @@
# Golden Set Authoring Guide
This document describes the authoring workflow for creating and curating Golden Sets - ground-truth definitions of vulnerability code-level manifestation facts used for binary vulnerability detection.
## Overview
Golden Sets are YAML-based definitions that describe:
- **Vulnerable functions** - Entry points where vulnerabilities manifest
- **Sink functions** - Dangerous API calls that enable exploitation
- **Edge patterns** - Control flow patterns indicating vulnerability presence
- **Constants** - Magic numbers, buffer sizes, or version markers
- **Witness inputs** - Example triggers for the vulnerability
## Architecture
```
┌──────────────────────────────────────────────────────────────────────────────┐
│ Golden Set Authoring Pipeline │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ CVE/Advisory │───>│ Extractors │───>│ Draft Golden Set │ │
│ │ Sources │ │ (NVD/OSV/GHSA) │ │ │ │
│ └────────────────┘ └─────────────────┘ └──────────────────────┘ │
│ │ │ │
│ v v │
│ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ Upstream Commit │ │ AI Enrichment │ │
│ │ Analyzer │───>│ Service │ │
│ └─────────────────┘ └──────────────────────┘ │
│ │ │
│ v │
│ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ Validator │<───│ Review Workflow │ │
│ └─────────────────┘ └──────────────────────┘ │
│ │ │ │
│ v v │
│ ┌─────────────────────────────────────────────┐ │
│ │ PostgreSQL Storage │ │
│ │ (content-addressed, versioned) │ │
│ └─────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
```
## Components
### 1. Extractors
Extractors pull vulnerability data from advisory sources:
```csharp
// Extract from NVD/OSV/GHSA
var extractor = serviceProvider.GetRequiredService<IGoldenSetExtractor>();
var result = await extractor.ExtractAsync(
"CVE-2024-1234",
"openssl",
new ExtractionOptions
{
UseAiEnrichment = true,
IncludeUpstreamCommits = true,
IncludeRelatedCves = true
});
```
**Supported Sources:**
- **NVD** - National Vulnerability Database
- **OSV** - Open Source Vulnerabilities
- **GHSA** - GitHub Security Advisories
### 2. Upstream Commit Analyzer
Analyzes fix commits to extract:
- Modified functions (from hunk headers)
- Added constants (hex values, buffer sizes)
- Added conditions (bounds checks, NULL checks)
```csharp
var analyzer = serviceProvider.GetRequiredService<IUpstreamCommitAnalyzer>();
// Parse commit URL
var parsed = analyzer.ParseCommitUrl("https://github.com/curl/curl/commit/abc123");
// Analyze commits
var result = await analyzer.AnalyzeAsync([
"https://github.com/curl/curl/commit/abc123",
"https://github.com/curl/curl/commit/def456"
]);
// Result contains:
// - ModifiedFunctions: ["parse_header", "validate_length"]
// - AddedConstants: ["0x1000", "sizeof(buffer)"]
// - AddedConditions: ["bounds_check", "null_check"]
```
**Supported Platforms:**
- GitHub (`github.com/owner/repo/commit/hash`)
- GitLab (`gitlab.com/owner/repo/-/commit/hash`)
- Bitbucket (`bitbucket.org/owner/repo/commits/hash`)
### 3. CWE-to-Sink Mapper
Maps CWE classifications to relevant sink functions:
```csharp
// Get sinks for buffer overflow CWEs
var sinks = CweToSinkMapper.GetSinksForCwes(["CWE-120", "CWE-122"]);
// Returns: ["memcpy", "strcpy", "sprintf", "gets", ...]
// Get all mapped CWEs
var cwes = CweToSinkMapper.GetMappedCwes();
```
**Supported CWE Categories:**
| Category | CWE IDs | Example Sinks |
|----------|---------|---------------|
| Buffer Overflow | CWE-120, CWE-121, CWE-122, CWE-787 | `memcpy`, `strcpy`, `sprintf` |
| Format String | CWE-134 | `printf`, `fprintf`, `sprintf` |
| Integer Overflow | CWE-190, CWE-191 | `malloc`, `calloc`, `realloc` |
| Use After Free | CWE-416 | `free`, `delete`, `delete[]` |
| Command Injection | CWE-78 | `system`, `popen`, `execve` |
| SQL Injection | CWE-89 | `PQexec`, `mysql_query`, `sqlite3_exec` |
| Path Traversal | CWE-22 | `fopen`, `open`, `access` |
| NULL Pointer | CWE-476 | (dereference detection) |
### 4. AI Enrichment Service
Optional AI-assisted enrichment using advisory text and commit analysis:
```csharp
var enrichmentService = serviceProvider.GetRequiredService<IGoldenSetEnrichmentService>();
if (enrichmentService.IsAvailable)
{
var result = await enrichmentService.EnrichAsync(
draftGoldenSet,
new GoldenSetEnrichmentContext
{
CommitAnalysis = commitResult,
CweIds = ["CWE-787"],
AdvisoryText = "Buffer overflow in parse_header..."
});
// Result.EnrichedDraft contains improved definition
// Result.ActionsApplied describes what was added/refined
}
```
**Enrichment Actions:**
- `function_added` - New vulnerable function identified
- `sink_added` - New sink function from CWE mapping
- `constant_extracted` - Magic value from commits
- `edge_suggested` - Control flow pattern suggested
- `witness_hint_added` - Example trigger input
### 5. Review Workflow
State machine for golden set curation:
```
Draft ──> InReview ──> Approved ──> Deprecated ──> Archived
│ │ │
└───────────┴────────────┴── (can return to Draft)
```
```csharp
var reviewService = serviceProvider.GetRequiredService<IGoldenSetReviewService>();
// Submit for review
await reviewService.SubmitForReviewAsync("CVE-2024-1234", "author@example.com");
// Approve
await reviewService.ApproveAsync("CVE-2024-1234", "reviewer@example.com", "LGTM");
// Or request changes
await reviewService.RequestChangesAsync(
"CVE-2024-1234",
"reviewer@example.com",
"Needs specific function name",
[new ChangeRequest { Field = "targets[0].functionName", Suggestion = "parse_header" }]);
```
## Golden Set Schema
```yaml
# CVE-2024-1234.golden.yaml
schema_version: "1.0"
id: CVE-2024-1234
component: openssl
targets:
- function: parse_header
sinks:
- memcpy
- strcpy
constants:
- "0x1000"
- "sizeof(buffer)"
edges:
- bb1->bb2 # bounds check bypass
witness:
stdin: "AAAA..."
argv:
- "--vulnerable-option"
env:
BUFFER_SIZE: "99999"
metadata:
author_id: researcher@example.com
source_ref: https://nvd.nist.gov/vuln/detail/CVE-2024-1234
created_at: 2024-01-15T10:30:00Z
tags:
- memory-corruption
- heap-overflow
```
## Configuration
```yaml
# appsettings.yaml
BinaryIndex:
GoldenSet:
SchemaVersion: "1.0"
Validation:
ValidateCveExists: true
ValidateSinks: true
StrictEdgeFormat: true
OfflineMode: false
Storage:
PostgresSchema: golden_sets
ConnectionStringName: BinaryIndex
Caching:
SinkRegistryCacheMinutes: 60
DefinitionCacheMinutes: 15
Authoring:
EnableAiEnrichment: true
EnableCommitAnalysis: true
MaxCommitsToAnalyze: 5
AutoAcceptConfidenceThreshold: 0.8
```
## Service Registration
```csharp
// Program.cs or Startup.cs
services.AddGoldenSetServices(configuration);
services.AddGoldenSetAuthoring();
services.AddGoldenSetPostgresStorage();
// Optional: Add HTTP client for commit analysis
services.AddHttpClient("upstream-commits", client =>
{
client.Timeout = TimeSpan.FromSeconds(30);
client.DefaultRequestHeaders.Add("User-Agent", "StellaOps-GoldenSet/1.0");
});
```
## CLI Usage
```bash
# Initialize a golden set from CVE
stella scanner golden init CVE-2024-1234 --component openssl
# With options
stella scanner golden init CVE-2024-1234 \
--component openssl \
--output ./golden-sets/CVE-2024-1234.yaml \
--no-ai \
--store
# Interactive mode for refinement
stella scanner golden init CVE-2024-1234 --interactive
# Export as JSON
stella scanner golden init CVE-2024-1234 --json
```
## Validation Rules
1. **CVE Format** - Must match `CVE-YYYY-NNNNN` or `GHSA-xxxx-xxxx-xxxx`
2. **Component Required** - Non-empty component name
3. **Targets Required** - At least one vulnerable target
4. **Sinks Validation** - Sinks must be in the sink registry
5. **Edge Format** - Must match `bbN->bbM` pattern (if strict mode)
6. **Constants Format** - Hex constants must be valid (`0x...`)
## Best Practices
1. **Start with Commit Analysis** - Fix commits are the most reliable source
2. **Use CWE Mapping** - Automatic sink suggestions based on vulnerability type
3. **Validate Locally** - Always validate before submitting for review
4. **Include Witness Data** - Example inputs help verify detection accuracy
5. **Tag Appropriately** - Use consistent tags for categorization
6. **Document Source** - Always include source_ref for traceability
## Metrics
Track authoring quality with:
- **Extraction Confidence** - Overall, per-source, per-field
- **Enrichment Actions** - What was added automatically
- **Review Iterations** - How many rounds before approval
- **Detection Rate** - How well the golden set detects known-vulnerable binaries
## See Also
- [Golden Set Schema Reference](../schemas/golden-set-schema.md)
- [Sink Registry](../modules/scanner/sink-registry.md)
- [Binary Analysis Architecture](../modules/scanner/architecture.md)
- [Vulnerability Detection](../modules/scanner/vulnerability-detection.md)