Complete batch 012 (golden set diff) and 013 (advisory chat), fix build errors

Sprints completed: - SPRINT_20260110_012_* (golden set diff layer - 10 sprints) - SPRINT_20260110_013_* (advisory chat - 4 sprints) Build fixes applied: - Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create - Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite) - Fix VexSchemaValidationTests FluentAssertions method name - Fix FixChainGateIntegrationTests ambiguous type references - Fix AdvisoryAI test files required properties and namespace aliases - Add stub types for CveMappingController (ICveSymbolMappingService) - Fix VerdictBuilderService static context issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 10:09:07 +02:00
parent a3b2f30a11
commit 7f7eb8b228
232 changed files with 58979 additions and 91 deletions
--- a/docs/modules/scanner/golden-set-authoring.md
+++ b/docs/modules/scanner/golden-set-authoring.md
@@ -0,0 +1,311 @@
+# Golden Set Authoring Guide
+
+This document describes the authoring workflow for creating and curating Golden Sets - ground-truth definitions of vulnerability code-level manifestation facts used for binary vulnerability detection.
+
+## Overview
+
+Golden Sets are YAML-based definitions that describe:
+- **Vulnerable functions** - Entry points where vulnerabilities manifest
+- **Sink functions** - Dangerous API calls that enable exploitation
+- **Edge patterns** - Control flow patterns indicating vulnerability presence
+- **Constants** - Magic numbers, buffer sizes, or version markers
+- **Witness inputs** - Example triggers for the vulnerability
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                         Golden Set Authoring Pipeline                        │
+├──────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│  ┌────────────────┐    ┌─────────────────┐    ┌──────────────────────┐      │
+│  │  CVE/Advisory  │───>│   Extractors    │───>│    Draft Golden Set  │      │
+│  │    Sources     │    │  (NVD/OSV/GHSA) │    │                      │      │
+│  └────────────────┘    └─────────────────┘    └──────────────────────┘      │
+│                               │                          │                   │
+│                               v                          v                   │
+│                        ┌─────────────────┐    ┌──────────────────────┐      │
+│                        │ Upstream Commit │    │   AI Enrichment      │      │
+│                        │    Analyzer     │───>│   Service            │      │
+│                        └─────────────────┘    └──────────────────────┘      │
+│                                                          │                   │
+│                                                          v                   │
+│                        ┌─────────────────┐    ┌──────────────────────┐      │
+│                        │   Validator     │<───│   Review Workflow    │      │
+│                        └─────────────────┘    └──────────────────────┘      │
+│                               │                          │                   │
+│                               v                          v                   │
+│                        ┌─────────────────────────────────────────────┐      │
+│                        │           PostgreSQL Storage                │      │
+│                        │    (content-addressed, versioned)           │      │
+│                        └─────────────────────────────────────────────┘      │
+│                                                                              │
+└──────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Components
+
+### 1. Extractors
+
+Extractors pull vulnerability data from advisory sources:
+
+```csharp
+// Extract from NVD/OSV/GHSA
+var extractor = serviceProvider.GetRequiredService<IGoldenSetExtractor>();
+
+var result = await extractor.ExtractAsync(
+    "CVE-2024-1234",
+    "openssl",
+    new ExtractionOptions
+    {
+        UseAiEnrichment = true,
+        IncludeUpstreamCommits = true,
+        IncludeRelatedCves = true
+    });
+```
+
+**Supported Sources:**
+- **NVD** - National Vulnerability Database
+- **OSV** - Open Source Vulnerabilities
+- **GHSA** - GitHub Security Advisories
+
+### 2. Upstream Commit Analyzer
+
+Analyzes fix commits to extract:
+- Modified functions (from hunk headers)
+- Added constants (hex values, buffer sizes)
+- Added conditions (bounds checks, NULL checks)
+
+```csharp
+var analyzer = serviceProvider.GetRequiredService<IUpstreamCommitAnalyzer>();
+
+// Parse commit URL
+var parsed = analyzer.ParseCommitUrl("https://github.com/curl/curl/commit/abc123");
+
+// Analyze commits
+var result = await analyzer.AnalyzeAsync([
+    "https://github.com/curl/curl/commit/abc123",
+    "https://github.com/curl/curl/commit/def456"
+]);
+
+// Result contains:
+// - ModifiedFunctions: ["parse_header", "validate_length"]
+// - AddedConstants: ["0x1000", "sizeof(buffer)"]
+// - AddedConditions: ["bounds_check", "null_check"]
+```
+
+**Supported Platforms:**
+- GitHub (`github.com/owner/repo/commit/hash`)
+- GitLab (`gitlab.com/owner/repo/-/commit/hash`)
+- Bitbucket (`bitbucket.org/owner/repo/commits/hash`)
+
+### 3. CWE-to-Sink Mapper
+
+Maps CWE classifications to relevant sink functions:
+
+```csharp
+// Get sinks for buffer overflow CWEs
+var sinks = CweToSinkMapper.GetSinksForCwes(["CWE-120", "CWE-122"]);
+// Returns: ["memcpy", "strcpy", "sprintf", "gets", ...]
+
+// Get all mapped CWEs
+var cwes = CweToSinkMapper.GetMappedCwes();
+```
+
+**Supported CWE Categories:**
+| Category | CWE IDs | Example Sinks |
+|----------|---------|---------------|
+| Buffer Overflow | CWE-120, CWE-121, CWE-122, CWE-787 | `memcpy`, `strcpy`, `sprintf` |
+| Format String | CWE-134 | `printf`, `fprintf`, `sprintf` |
+| Integer Overflow | CWE-190, CWE-191 | `malloc`, `calloc`, `realloc` |
+| Use After Free | CWE-416 | `free`, `delete`, `delete[]` |
+| Command Injection | CWE-78 | `system`, `popen`, `execve` |
+| SQL Injection | CWE-89 | `PQexec`, `mysql_query`, `sqlite3_exec` |
+| Path Traversal | CWE-22 | `fopen`, `open`, `access` |
+| NULL Pointer | CWE-476 | (dereference detection) |
+
+### 4. AI Enrichment Service
+
+Optional AI-assisted enrichment using advisory text and commit analysis:
+
+```csharp
+var enrichmentService = serviceProvider.GetRequiredService<IGoldenSetEnrichmentService>();
+
+if (enrichmentService.IsAvailable)
+{
+    var result = await enrichmentService.EnrichAsync(
+        draftGoldenSet,
+        new GoldenSetEnrichmentContext
+        {
+            CommitAnalysis = commitResult,
+            CweIds = ["CWE-787"],
+            AdvisoryText = "Buffer overflow in parse_header..."
+        });
+
+    // Result.EnrichedDraft contains improved definition
+    // Result.ActionsApplied describes what was added/refined
+}
+```
+
+**Enrichment Actions:**
+- `function_added` - New vulnerable function identified
+- `sink_added` - New sink function from CWE mapping
+- `constant_extracted` - Magic value from commits
+- `edge_suggested` - Control flow pattern suggested
+- `witness_hint_added` - Example trigger input
+
+### 5. Review Workflow
+
+State machine for golden set curation:
+
+```
+   Draft ──> InReview ──> Approved ──> Deprecated ──> Archived
+     │           │            │
+     └───────────┴────────────┴── (can return to Draft)
+```
+
+```csharp
+var reviewService = serviceProvider.GetRequiredService<IGoldenSetReviewService>();
+
+// Submit for review
+await reviewService.SubmitForReviewAsync("CVE-2024-1234", "author@example.com");
+
+// Approve
+await reviewService.ApproveAsync("CVE-2024-1234", "reviewer@example.com", "LGTM");
+
+// Or request changes
+await reviewService.RequestChangesAsync(
+    "CVE-2024-1234",
+    "reviewer@example.com",
+    "Needs specific function name",
+    [new ChangeRequest { Field = "targets[0].functionName", Suggestion = "parse_header" }]);
+```
+
+## Golden Set Schema
+
+```yaml
+# CVE-2024-1234.golden.yaml
+schema_version: "1.0"
+id: CVE-2024-1234
+component: openssl
+
+targets:
+  - function: parse_header
+    sinks:
+      - memcpy
+      - strcpy
+    constants:
+      - "0x1000"
+      - "sizeof(buffer)"
+    edges:
+      - bb1->bb2  # bounds check bypass
+
+witness:
+  stdin: "AAAA..."
+  argv:
+    - "--vulnerable-option"
+  env:
+    BUFFER_SIZE: "99999"
+
+metadata:
+  author_id: researcher@example.com
+  source_ref: https://nvd.nist.gov/vuln/detail/CVE-2024-1234
+  created_at: 2024-01-15T10:30:00Z
+  tags:
+    - memory-corruption
+    - heap-overflow
+```
+
+## Configuration
+
+```yaml
+# appsettings.yaml
+BinaryIndex:
+  GoldenSet:
+    SchemaVersion: "1.0"
+    Validation:
+      ValidateCveExists: true
+      ValidateSinks: true
+      StrictEdgeFormat: true
+      OfflineMode: false
+    Storage:
+      PostgresSchema: golden_sets
+      ConnectionStringName: BinaryIndex
+    Caching:
+      SinkRegistryCacheMinutes: 60
+      DefinitionCacheMinutes: 15
+    Authoring:
+      EnableAiEnrichment: true
+      EnableCommitAnalysis: true
+      MaxCommitsToAnalyze: 5
+      AutoAcceptConfidenceThreshold: 0.8
+```
+
+## Service Registration
+
+```csharp
+// Program.cs or Startup.cs
+services.AddGoldenSetServices(configuration);
+services.AddGoldenSetAuthoring();
+services.AddGoldenSetPostgresStorage();
+
+// Optional: Add HTTP client for commit analysis
+services.AddHttpClient("upstream-commits", client =>
+{
+    client.Timeout = TimeSpan.FromSeconds(30);
+    client.DefaultRequestHeaders.Add("User-Agent", "StellaOps-GoldenSet/1.0");
+});
+```
+
+## CLI Usage
+
+```bash
+# Initialize a golden set from CVE
+stella scanner golden init CVE-2024-1234 --component openssl
+
+# With options
+stella scanner golden init CVE-2024-1234 \
+    --component openssl \
+    --output ./golden-sets/CVE-2024-1234.yaml \
+    --no-ai \
+    --store
+
+# Interactive mode for refinement
+stella scanner golden init CVE-2024-1234 --interactive
+
+# Export as JSON
+stella scanner golden init CVE-2024-1234 --json
+```
+
+## Validation Rules
+
+1. **CVE Format** - Must match `CVE-YYYY-NNNNN` or `GHSA-xxxx-xxxx-xxxx`
+2. **Component Required** - Non-empty component name
+3. **Targets Required** - At least one vulnerable target
+4. **Sinks Validation** - Sinks must be in the sink registry
+5. **Edge Format** - Must match `bbN->bbM` pattern (if strict mode)
+6. **Constants Format** - Hex constants must be valid (`0x...`)
+
+## Best Practices
+
+1. **Start with Commit Analysis** - Fix commits are the most reliable source
+2. **Use CWE Mapping** - Automatic sink suggestions based on vulnerability type
+3. **Validate Locally** - Always validate before submitting for review
+4. **Include Witness Data** - Example inputs help verify detection accuracy
+5. **Tag Appropriately** - Use consistent tags for categorization
+6. **Document Source** - Always include source_ref for traceability
+
+## Metrics
+
+Track authoring quality with:
+- **Extraction Confidence** - Overall, per-source, per-field
+- **Enrichment Actions** - What was added automatically
+- **Review Iterations** - How many rounds before approval
+- **Detection Rate** - How well the golden set detects known-vulnerable binaries
+
+## See Also
+
+- [Golden Set Schema Reference](../schemas/golden-set-schema.md)
+- [Sink Registry](../modules/scanner/sink-registry.md)
+- [Binary Analysis Architecture](../modules/scanner/architecture.md)
+- [Vulnerability Detection](../modules/scanner/vulnerability-detection.md)