semi implemented and features implemented save checkpoint

This commit is contained in:
master
2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions

View File

@@ -0,0 +1,28 @@
# Binary Call-Graph Extraction and Reachability Analysis
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Binary call-graph extraction with BinaryCallGraphExtractor, reachability lifting via BinaryReachabilityLifter, dedicated BinaryIndex analysis module, and CLI binary commands.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- **Key Classes**:
- `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/ReachGraphBinaryReachabilityService.cs`) - binary-level reachability integration with ReachGraph
- `TaintGateExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/TaintGateExtractor.cs`) - extracts taint gates (bounds checks, null checks, auth checks, permission checks, type checks) from binary call paths
- `CfgExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/CfgExtractor.cs`) - control flow graph extraction from disassembled binaries
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - generates call-sequence n-grams from lifted IR for call graph analysis
- `CallGraphMatcherAdapter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/Matchers/MatcherAdapters.cs`) - adapter for call graph matching in validation harness
- **Interfaces**: `ICallNgramGenerator`, `IBinaryFeatureExtractor`
## E2E Test Plan
- [ ] Submit an ELF binary and verify call-graph extraction produces a valid set of function nodes and edges
- [ ] Verify `TaintGateExtractor` classifies conditions correctly (bounds check, null check, auth check, permission check, type check)
- [ ] Verify `CfgExtractor` produces control flow graphs from disassembled functions
- [ ] Verify `CallNgramGenerator` generates n-grams (n=2,3,4) from lifted function IR and computes Jaccard similarity
- [ ] Verify `ReachGraphBinaryReachabilityService` integrates with the ReachGraph module for function-level exploitability assessment
- [ ] Verify call-graph-based reachability results feed into the ensemble decision engine

View File

@@ -0,0 +1,29 @@
# Binary Identity Extraction (Build-ID Based)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Binary identity extraction using Build-IDs and symbol observations for ELF binary identification, with ground-truth validation and SBOM stability verification.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`
- **Key Classes**:
- `BinaryIdentityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/BinaryIdentityService.cs`) - main service for extracting binary identity from ELF/PE/Mach-O binaries
- `ElfFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/ElfFeatureExtractor.cs`) - extracts Build-ID, symbol tables, and section info from ELF binaries
- `PeFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/PeFeatureExtractor.cs`) - extracts CodeView GUID from Windows PE binaries
- `MachoFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/MachoFeatureExtractor.cs`) - extracts LC_UUID from Mach-O binaries
- `StreamGuard` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/StreamGuard.cs`) - safe stream handling for non-seekable streams
- **Interfaces**: `IBinaryFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/IBinaryFeatureExtractor.cs`)
- **Models**: `BinaryIdentity` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Models/BinaryIdentity.cs`)
## E2E Test Plan
- [ ] Submit an ELF binary with a known Build-ID and verify the extracted identity matches
- [ ] Submit a Windows PE binary and verify CodeView GUID extraction via `PeFeatureExtractor`
- [ ] Submit a Mach-O binary and verify LC_UUID extraction via `MachoFeatureExtractor`
- [ ] Verify that non-seekable streams are handled correctly via `StreamGuard`
- [ ] Verify that binaries without Build-IDs fall back to symbol-based identification
- [ ] Verify extracted identities are persisted and queryable through `BinaryVulnerabilityService`

View File

@@ -0,0 +1,28 @@
# Binary Intelligence Graph / Binary Identity Indexing
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Complete BinaryIndex module with binary identity indexing, ELF feature extraction, vulnerability fingerprint matching, and reachability status tracking. Advisory marked as SUPERSEDED by this implementation.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`
- **Key Classes**:
- `BinaryIdentityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/BinaryIdentityService.cs`) - binary identity management
- `ElfFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/ElfFeatureExtractor.cs`) - ELF feature extraction
- `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - vulnerability matching with Build-ID catalog lookups
- `SignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs`) - signature-based vulnerability fingerprint matching
- `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/ReachGraphBinaryReachabilityService.cs`) - reachability status tracking
- **Models**: `BinaryIdentity`, `FixModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Models/`)
- **Persistence**: `IBinaryVulnAssertionRepository`, `IBinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/`)
## E2E Test Plan
- [ ] Verify end-to-end flow: submit binary, extract identity, index in the graph, and query by Build-ID
- [ ] Verify vulnerability fingerprint matching via `SignatureMatcher` returns correct match scores
- [ ] Verify reachability status tracking integrates with ReachGraph
- [ ] Verify `BinaryVulnerabilityService` correctly maps match methods (buildid_catalog, delta_signature, etc.)
- [ ] Verify binary identity indexing supports multi-tenant contexts via `ITenantContext`

View File

@@ -0,0 +1,28 @@
# Binary Proof Verification Pipeline
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Full binary proof verification with ground truth sources (buildinfo, debuginfod, reproducible builds), validation, and golden set testing.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/`
- **Key Classes**:
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - orchestrates reproducible-build-based validation runs
- `ValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/ValidationHarness.cs`) - main validation harness with matcher adapter factory integration
- `KpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/KpiRegressionService.cs`) - KPI regression detection across validation runs
- `GroundTruthProvenanceResolver` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Provenance/GroundTruthProvenanceResolver.cs`) - resolves symbol provenance from ground truth sources
- **Interfaces**: `IValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/IValidationHarness.cs`), `IKpiRegressionService`, `ISymbolProvenanceResolver`
- **Registration**: `ServiceCollectionExtensions.AddCorpusBundleExport/Import` for bundle exchange
## E2E Test Plan
- [ ] Run a validation harness against a known binary pair and verify proof correctness
- [ ] Verify ground truth resolution from buildinfo sources produces correct provenance data
- [ ] Verify KPI regression service detects accuracy drops between validation runs
- [ ] Verify golden set validation produces deterministic, reproducible results
- [ ] Verify corpus bundle export/import round-trips correctly
- [ ] Verify validation run attestor generates valid attestation predicates with corpus snapshot IDs

View File

@@ -0,0 +1,26 @@
# Binary Reachability Analysis
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Binary-level reachability analysis integrating with the ReachGraph and taint gate extraction for function-level exploitability assessment.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`
- **Key Classes**:
- `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/ReachGraphBinaryReachabilityService.cs`) - connects binary analysis to the ReachGraph module for function-level reachability
- `TaintGateExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/TaintGateExtractor.cs`) - identifies taint gate types (BoundsCheck, NullCheck, AuthCheck, PermissionCheck, TypeCheck) from condition strings
- `SignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs`) - matches vulnerability signatures at the binary level
- **Models**: `AnalysisResultModels`, `FingerprintModels`, `SignatureIndexModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/`)
- **Interfaces**: defined in `Interfaces.cs`, implementations in `Implementations.cs`
## E2E Test Plan
- [ ] Submit a binary with a known vulnerable function and verify reachability analysis identifies it as reachable from entry points
- [ ] Verify `TaintGateExtractor` correctly classifies all gate types (bounds, null, auth, permission, type checks)
- [ ] Verify that unreachable vulnerable functions reduce the exploitability score
- [ ] Verify integration between `ReachGraphBinaryReachabilityService` and the ReachGraph module
- [ ] Verify that taint gate presence between entry point and vulnerable function is reflected in the analysis result

View File

@@ -0,0 +1,30 @@
# Binary Resolution API with Cache Layer
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
REST API endpoints (`POST /api/v1/resolve/vuln` and `/vuln/batch`) for querying whether a CVE is resolved through binary-level backport detection. Includes Valkey-backed response caching, rate limiting middleware, and telemetry instrumentation.
## Implementation Details
- **Modules**: `src/BinaryIndex/StellaOps.BinaryIndex.WebService/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/`
- **Key Classes**:
- `ResolutionController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/ResolutionController.cs`) - REST API controller with `POST /api/v1/resolve/vuln` and `/vuln/batch` endpoints
- `ResolutionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Resolution/ResolutionService.cs`) - core resolution logic
- `CachedResolutionService` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Services/CachedResolutionService.cs`) - decorator adding Valkey-backed caching around ResolutionService
- `ResolutionCacheService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/ResolutionCacheService.cs`) - Valkey cache operations for resolution results
- `RateLimitingMiddleware` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Middleware/RateLimitingMiddleware.cs`) - per-tenant rate limiting with X-RateLimit headers
- `ResolutionTelemetry` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Telemetry/ResolutionTelemetry.cs`) - OpenTelemetry metrics for resolution requests, cache hits, rate limits
- **Contracts**: `VulnResolutionRequest/Response`, `ResolutionMatchTypes` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Contracts/Resolution/VulnResolutionContracts.cs`)
- **Cache Options**: `BinaryCacheOptions`, `CacheOptionsValidation` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/`)
## E2E Test Plan
- [ ] Send `POST /api/v1/resolve/vuln` with a known CVE and package purl, verify resolution response contains match type (BuildId, DeltaSignature, etc.)
- [ ] Send batch request to `/api/v1/resolve/vuln/batch` with multiple packages and verify all are resolved
- [ ] Verify cache hit: send same request twice and confirm second response comes from cache (check telemetry counters)
- [ ] Verify rate limiting: exceed the configured request limit and confirm 429 response with X-RateLimit headers
- [ ] Verify telemetry: confirm resolution metrics are emitted (request count, cache hit ratio, latency histogram)
- [ ] Verify disabled rate limiting mode passes requests through without headers

View File

@@ -0,0 +1,30 @@
# Binary Symbol Table Diff Engine
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Symbol table comparison between binary versions tracking exported/imported symbol changes, version map diffs, GOT/PLT table modifications, and ABI compatibility assessment. Produces content-addressed diff IDs for deterministic reporting.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/`
- **Key Classes**:
- `SymbolTableDiffAnalyzer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/SymbolTableDiffAnalyzer.cs`) - computes diffs between symbol tables with `ComputeDiffAsync` and `AssessAbiCompatibility`
- `SymbolTableDiff` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/SymbolTableDiff.cs`) - diff result model with added/removed/changed symbols
- `VersionMapDiff` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/VersionMapDiff.cs`) - tracks changes in ELF version maps
- `AbiCompatibility` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/AbiCompatibility.cs`) - ABI compatibility assessment (FullyCompatible, Warnings, Incompatible)
- `DynamicLinkingDiff` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/DynamicLinkingDiff.cs`) - GOT/PLT table modification tracking
- `NameDemangler` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/NameDemangler.cs`) - C++ symbol name demangling
- **Interfaces**: `ISymbolTableDiffAnalyzer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/SymbolDiff/ISymbolTableDiffAnalyzer.cs`)
- **Registration**: `SymbolDiffServiceExtensions` for DI setup
## E2E Test Plan
- [ ] Compute diff between two ELF binaries with known symbol changes and verify added/removed symbols are correctly identified
- [ ] Verify `AssessAbiCompatibility` returns `FullyCompatible` when only symbols are added
- [ ] Verify `AssessAbiCompatibility` returns `Incompatible` when exported symbols are removed
- [ ] Verify version map diff detection for ELF version script changes
- [ ] Verify C++ symbol demangling produces human-readable names via `NameDemangler`
- [ ] Verify content-addressed diff IDs are deterministic for identical inputs

View File

@@ -0,0 +1,28 @@
# Binary-to-VEX Claim Auto-Generation (VexBridge Library)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Automated generation of VEX claims from binary fingerprint match results. The VexBridge library translates binary match evidence into DSSE-signed VEX statements with confidence scores, enabling automated VEX claim production from binary analysis without manual triage.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.VexBridge/`
- **Key Classes**:
- `VexEvidenceGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.VexBridge/VexEvidenceGenerator.cs`) - generates VEX observations from `BinaryVulnMatch` results; maps `FixState` to `VexClaimStatus` (Fixed -> NotAffected, Vulnerable -> Affected, Unknown -> UnderInvestigation)
- `BinaryMatchEvidenceSchema` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.VexBridge/BinaryMatchEvidenceSchema.cs`) - defines evidence schema with match type constants (BuildId, DeltaSignature, etc.)
- `VexBridgeOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.VexBridge/VexBridgeOptions.cs`) - configuration for confidence thresholds
- `DeltaSigVexBridge` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/VexIntegration/DeltaSigVexBridge.cs`) - bridges delta-signature analysis results into VEX observations with provenance data
- **Interfaces**: `IVexEvidenceGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.VexBridge/IVexEvidenceGenerator.cs`), `IDeltaSigVexBridge`
## E2E Test Plan
- [ ] Generate a VEX claim from a `Fixed` binary match and verify status is `NotAffected` with justification `VulnerableCodeNotPresent`
- [ ] Generate a VEX claim from a `Vulnerable` match and verify status is `Affected`
- [ ] Generate a VEX claim from an `Unknown` match and verify status is `UnderInvestigation`
- [ ] Verify confidence threshold enforcement: low-confidence matches below threshold are rejected
- [ ] Verify Build-ID references are included in VEX evidence when present
- [ ] Verify `DeltaSigVexBridge` produces VEX observations with symbol provenance metadata
- [ ] Verify generated VEX statements include correct DSSE evidence references

View File

@@ -0,0 +1,27 @@
# BinaryIndex Ops CLI Commands (stella binary ops)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
CLI commands for BinaryIndex ops: health, bench, cache, config subcommands with JSON/table output and BinaryIndex base URL configuration. Also adds --semantic flag to deltasig extract/author/match commands.
## Implementation Details
- **Modules**: `src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/`, `src/Cli/`
- **Key Classes**:
- `BinaryIndexOpsController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/BinaryIndexOpsController.cs`) - serves health, bench, cache stats, and config endpoints consumed by CLI
- `BinaryIndexOpsHealthResponse` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOpsModels.cs`) - health response model with lifter warmness, component versions
- `BinaryIndexOpsOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOpsModels.cs`) - ops configuration with redacted keys and bench rate limits
- `B2R2LifterPool` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LifterPool.cs`) - lifter pool stats reported via ops health endpoint
- **Source**: SPRINT_20260112_006_CLI_binaryindex_ops_cli.md
## E2E Test Plan
- [ ] Run `stella binary ops health` and verify JSON output includes lifter warmness and version info
- [ ] Run `stella binary ops bench` and verify latency measurement results are returned
- [ ] Run `stella binary ops cache` and verify Valkey hit/miss statistics are reported
- [ ] Run `stella binary ops config` and verify effective configuration is returned with secrets redacted
- [ ] Run `stella deltasig extract --semantic` and verify semantic flag is passed through
- [ ] Verify table output format renders correctly for all subcommands

View File

@@ -0,0 +1,27 @@
# BinaryIndex Ops Endpoints (Health, Bench, Cache Stats, Config)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Ops endpoints for BinaryIndex: health (lifter warmness), bench/run (latency measurement), cache stats (Valkey hit/miss), and effective config with deterministic JSON responses.
## Implementation Details
- **Modules**: `src/BinaryIndex/StellaOps.BinaryIndex.WebService/`
- **Key Classes**:
- `BinaryIndexOpsController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/BinaryIndexOpsController.cs`) - exposes `GET /ops/health`, bench, cache stats, and config endpoints; integrates with `B2R2LifterPool` and `FunctionIrCacheService`
- `B2R2LifterPool` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LifterPool.cs`) - provides pool stats (warm ISAs, pool sizes, acquire timeouts)
- `FunctionIrCacheService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/FunctionIrCacheService.cs`) - Valkey-based function IR cache with hit/miss reporting
- `B2R2LifterPoolOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LifterPoolOptions.cs`) - pool configuration (MaxPoolSizePerIsa, EnableWarmPreload, AcquireTimeout)
- `BinaryIndexOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOptions.cs`) - top-level options with B2R2Pool, SemanticLifting sections
- **Source**: SPRINT_20260112_004_BINIDX_b2r2_lowuir_perf_cache.md
## E2E Test Plan
- [ ] Call `GET /ops/health` and verify response includes lifter pool warmness state and component versions
- [ ] Call bench endpoint and verify deterministic latency measurement JSON
- [ ] Call cache stats endpoint and verify Valkey hit/miss counts and cache key count
- [ ] Call config endpoint and verify effective configuration is returned with secrets redacted
- [ ] Verify all ops responses use deterministic JSON serialization (consistent key ordering)

View File

@@ -0,0 +1,30 @@
# BinaryIndex User Configuration System
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Comprehensive user configuration for B2R2 lifter pooling, LowUIR enablement, Valkey function cache behavior, PostgreSQL persistence, with ops endpoints for health/bench/cache/config and redaction rules for operator visibility.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/`
- **Key Classes**:
- `BinaryIndexOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOptions.cs`) - top-level config with sections for B2R2Pool, SemanticLifting, cache, persistence
- `B2R2PoolOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LifterPoolOptions.cs`) - MaxPoolSizePerIsa (1-64), EnableWarmPreload, AcquireTimeout, EnableMetrics, WarmPreloadIsas
- `SemanticLiftingOptions` - B2R2Version, Enabled flag, function limits
- `BinaryCacheOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/BinaryCacheOptions.cs`) - Valkey cache configuration
- `CacheOptionsValidation` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/CacheOptionsValidation.cs`) - validates cache config at startup
- `FunctionIrCacheOptions` - function IR cache TTL and size limits
- `BinaryIndexOpsOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOpsModels.cs`) - redacted keys list for operator visibility, bench rate limits
- **Source**: SPRINT_20260112_007_BINIDX_binaryindex_user_config.md
## E2E Test Plan
- [ ] Configure B2R2 pool with custom MaxPoolSizePerIsa and verify pool initializes with correct size
- [ ] Configure SemanticLifting as disabled and verify LowUIR lifting is skipped
- [ ] Configure Valkey cache options and verify function IR cache respects TTL settings
- [ ] Verify configuration binding from `StellaOps:BinaryIndex:*` config sections
- [ ] Verify redacted keys do not appear in ops config endpoint responses
- [ ] Verify CacheOptionsValidation rejects invalid configuration at startup

View File

@@ -0,0 +1,30 @@
# Byte-Level Binary Diffing with Rolling Hash Windows
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Byte-level binary comparison using rolling hash windows that identifies exactly which byte ranges changed between binary versions. Produces binary proof snippets with section analysis and privacy controls to strip raw bytes. Supports stream and file-based comparison.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`
- **Key Classes**:
- `PatchDiffEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/PatchDiffEngine.cs`) - core diffing engine computing byte-level differences between binary versions using function fingerprints
- `FunctionDiffer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionDiffer.cs`) - function-level comparison with semantic analysis option and call-graph edge diffing
- `FunctionRenameDetector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionRenameDetector.cs`) - detects function renames between versions using fingerprint similarity
- `VerdictCalculator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/VerdictCalculator.cs`) - computes patch verification verdicts from diff results
- `InMemoryDiffResultStore` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Storage/InMemoryDiffResultStore.cs`) - stores diff results with content-addressed IDs
- **Models**: `PatchDiffModels`, `DiffEvidenceModels`, `BinaryReference` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Models/`)
- **Interfaces**: `IPatchDiffEngine`, `IDiffResultStore` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`)
- **Source**: SPRINT_20260112_200_004_CHGTRC_byte_diffing.md
## E2E Test Plan
- [ ] Submit two binary versions and verify byte-range differences are identified with correct offsets
- [ ] Verify section analysis identifies which ELF sections changed (.text, .data, .rodata)
- [ ] Verify privacy controls strip raw bytes from proof snippets when configured
- [ ] Verify `FunctionRenameDetector` correctly identifies renamed functions between versions
- [ ] Verify `VerdictCalculator` produces correct patch verification verdict (patched vs unpatched)
- [ ] Verify diff results are stored with deterministic content-addressed IDs

View File

@@ -0,0 +1,27 @@
# Call-Ngram Fingerprinting for Binary Similarity Analysis
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Call-sequence n-gram extraction from lifted IR for improved cross-compiler binary similarity matching. Generates n-grams (n=2,3,4) from function call sequences and integrates into the semantic fingerprint pipeline with configurable dimension weights (instruction 0.4, CFG 0.3, call-ngram 0.2, semantic 0.1).
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
- **Key Classes**:
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - generates `CallNgramFingerprint` from `LiftedFunction` call sequences; computes Jaccard similarity between fingerprints
- `CallNgramFingerprint` (record in same file) - contains n-gram hash sets and metadata; has `Empty` sentinel for functions without calls
- **Interfaces**: `ICallNgramGenerator` (defined in `CallNgramGenerator.cs`) - `Generate(LiftedFunction)` and `ComputeSimilarity(CallNgramFingerprint, CallNgramFingerprint)`
- **Integration**: Used by `EnsembleDecisionEngine` and `FunctionAnalysisBuilder` as one of the matching dimensions with 0.2 default weight
- **Source**: SPRINT_20260118_026_BinaryIndex_deltasig_enhancements.md
## E2E Test Plan
- [ ] Generate call-ngram fingerprint from a function with known call sequences and verify correct n-gram extraction (n=2,3,4)
- [ ] Compute similarity between identical call sequences and verify similarity = 1.0
- [ ] Compute similarity between disjoint call sequences and verify similarity = 0.0
- [ ] Verify `CallNgramFingerprint.Empty` is returned for functions without call instructions
- [ ] Verify call-ngram dimension integrates into ensemble scoring with configurable weight (default 0.2)
- [ ] Verify cross-compiler similarity: same source compiled with GCC vs Clang should produce similar call n-grams

View File

@@ -0,0 +1,34 @@
# Corpus Ingestion and Query Services
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Corpus ingestion and query services with distro-specific connectors for Alpine, Debian, and RPM package ecosystems.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/`
- **Key Classes**:
- `CorpusIngestionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Services/CorpusIngestionService.cs`) - orchestrates binary ingestion into the corpus
- `CorpusQueryService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Services/CorpusQueryService.cs`) - queries corpus for function fingerprints and binary metadata
- `BatchFingerprintPipeline` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Services/BatchFingerprintPipeline.cs`) - batch fingerprint extraction from corpus binaries
- `FunctionClusteringService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Services/FunctionClusteringService.cs`) - clusters similar functions across corpus
- `CveFunctionMappingUpdater` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Services/CveFunctionMappingUpdater.cs`) - maps CVEs to affected functions
- `AlpineCorpusConnector` / `AlpinePackageExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/`)
- `DebianCorpusConnector` / `DebianPackageExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/`)
- `RpmCorpusConnector` / `RpmPackageExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/`)
- Library-specific connectors: `CurlCorpusConnector`, `GlibcCorpusConnector`, `OpenSslCorpusConnector`, `ZlibCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Connectors/`)
- **Interfaces**: `ICorpusIngestionService`, `ICorpusQueryService`, `IBinaryCorpusConnector`, `ILibraryCorpusConnector`, `ICorpusRepository`, `ICorpusSnapshotRepository`
- **Models**: `FunctionCorpusModels`, `CorpusQuery`, `CorpusSnapshot`
## E2E Test Plan
- [ ] Ingest a Debian package via `DebianCorpusConnector` and verify binary fingerprints are stored
- [ ] Ingest an Alpine APK via `AlpineCorpusConnector` and verify secfixes extraction via `ApkBuildSecfixesExtractor`
- [ ] Ingest an RPM package via `RpmCorpusConnector` and verify changelog extraction via `SrpmChangelogExtractor`
- [ ] Query corpus for a known function fingerprint via `CorpusQueryService` and verify match
- [ ] Run `BatchFingerprintPipeline` on a corpus snapshot and verify all binaries are fingerprinted
- [ ] Verify `CveFunctionMappingUpdater` creates correct CVE-to-function mappings
- [ ] Verify corpus snapshot creation with deterministic snapshot IDs

View File

@@ -0,0 +1,30 @@
# Delta signature matching and patch coverage analysis
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Delta signature matching traces symbol-level changes between vulnerable and fixed builds. PatchCoverageController exposes an API for patch coverage assessment.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`, `src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/`
- **Key Classes**:
- `DeltaSignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureMatcher.cs`) - matches delta signatures against target binaries
- `DeltaSignatureGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureGenerator.cs`) - generates delta signatures from binary pairs
- `DeltaSigService` / `DeltaSigServiceV2` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`) - service layer for delta signature operations (V2 adds IR diffs)
- `PatchCoverageController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/PatchCoverageController.cs`) - REST API for patch coverage queries using `IDeltaSignatureRepository`
- `SymbolChangeTracer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/SymbolChangeTracer.cs`) - traces symbol-level changes between builds
- `DeltaScopePolicyGate` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Policy/DeltaScopePolicyGate.cs`) - policy gate for delta scope enforcement
- **Interfaces**: `IDeltaSigService`, `IDeltaSignatureGenerator`, `IDeltaSignatureMatcher`, `ISymbolChangeTracer`
- **IR Diff**: `IrDiffGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/IrDiff/`) - generates IR-level diffs between function versions
## E2E Test Plan
- [ ] Generate a delta signature from known vulnerable/fixed binary pair and verify signature captures changed functions
- [ ] Match the generated delta signature against a target binary and verify correct patch status detection
- [ ] Query `PatchCoverageController` API for patch coverage and verify coverage percentage
- [ ] Verify `SymbolChangeTracer` identifies added, removed, and modified symbols
- [ ] Verify `DeltaScopePolicyGate` enforces delta scope policies
- [ ] Verify IR-level diff generation captures semantic function changes beyond byte-level diffs

View File

@@ -0,0 +1,30 @@
# Delta-Signature Predicates (Function-Level Binary Diffs)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Function-level delta signature predicates (v1 and v2) with signature generation, matching, and symbol change tracing. V2 adds symbol provenance and IR diffs, which is architecturally superior to the byte-level hunks proposed in the advisory.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- **Key Classes**:
- `DeltaSigPredicate` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Attestation/DeltaSigPredicate.cs`) - V1 predicate for attestation
- `DeltaSigPredicateV2` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Attestation/DeltaSigPredicateV2.cs`) - V2 predicate with symbol provenance and IR diff support
- `DeltaSigPredicateConverter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Attestation/DeltaSigPredicateConverter.cs`) - converts between predicate versions
- `DeltaSigAttestorIntegration` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Attestation/DeltaSigAttestorIntegration.cs`) - integrates delta-sig predicates with the Attestor module
- `GroundTruthProvenanceResolver` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Provenance/GroundTruthProvenanceResolver.cs`) - enriches matches with symbol provenance data
- `CfgExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/CfgExtractor.cs`) - extracts control flow graphs for delta-sig generation
- **Models**: `Models.cs` in DeltaSig namespace - function match records, signature models
- **VEX Integration**: `DeltaSigVexBridge` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/VexIntegration/`)
## E2E Test Plan
- [ ] Generate a V1 delta-sig predicate and verify it contains function-level diff data
- [ ] Generate a V2 delta-sig predicate and verify it includes symbol provenance and IR diff metadata
- [ ] Convert between V1 and V2 predicates via `DeltaSigPredicateConverter` and verify data fidelity
- [ ] Verify `DeltaSigAttestorIntegration` produces valid attestation predicates for the Attestor module
- [ ] Verify `GroundTruthProvenanceResolver` enriches function matches with provenance sources
- [ ] Verify V2 predicates flow into VEX observations via `DeltaSigVexBridge`

View File

@@ -0,0 +1,36 @@
# Disassembly and binary analysis pipeline
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Pluggable disassembly framework with Ghidra integration (BSim + version tracking) for binary analysis capabilities.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.Abstractions/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.Iced/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ghidra/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Decompiler/`
- **Key Classes**:
- `DisassemblyService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/DisassemblyService.cs`) - core disassembly orchestrator
- `HybridDisassemblyService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/HybridDisassemblyService.cs`) - multi-backend hybrid disassembly with quality-based plugin selection
- `DisassemblyPluginRegistry` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/DisassemblyPluginRegistry.cs`) - manages registered disassembly plugins
- `BinaryFormatDetector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/BinaryFormatDetector.cs`) - detects ELF/PE/Mach-O format from binary headers
- `B2R2DisassemblyPlugin` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2DisassemblyPlugin.cs`) - B2R2 backend with architecture mapping, instruction mapping, operand parsing
- `B2R2LowUirLiftingService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LowUirLiftingService.cs`) - lifts machine code to LowUIR intermediate representation with SSA transformation
- `B2R2LifterPool` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LifterPool.cs`) - object pool for B2R2 lifter instances with warm preloading
- `IcedDisassemblyPlugin` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.Iced/IcedDisassemblyPlugin.cs`) - Iced x86/x64 disassembler plugin
- `GhidraDisassemblyPlugin` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ghidra/Services/GhidraDisassemblyPlugin.cs`) - Ghidra integration
- `GhidraDecompilerAdapter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Decompiler/GhidraDecompilerAdapter.cs`) - Ghidra decompilation with AST comparison
- **Abstractions**: `IDisassemblyPlugin`, `IDisassemblyPluginRegistry`, `IDisassemblyService` with models for `BinaryFormat`, `CpuArchitecture`, `DisassembledInstruction`, `InstructionKind`, etc.
- **Decompiler**: Full AST comparison engine with recursive parser, code normalizer, semantic equivalence checking
## E2E Test Plan
- [ ] Load an x86-64 ELF binary via `HybridDisassemblyService` and verify disassembly produces valid instructions
- [ ] Verify `BinaryFormatDetector` correctly identifies ELF, PE, and Mach-O formats
- [ ] Verify B2R2 plugin handles architecture mapping for x86, x64, ARM, AArch64
- [ ] Verify B2R2 LowUIR lifting produces valid IR with SSA form
- [ ] Verify Iced plugin disassembles x86/x64 instructions correctly
- [ ] Verify `B2R2LifterPool` warm preloading and pool size management
- [ ] Verify Ghidra decompiler adapter produces comparable ASTs via `AstComparisonEngine`
- [ ] Verify hybrid disassembly quality scoring selects the best plugin for each binary

View File

@@ -0,0 +1,30 @@
# Ensemble decision engine for multi-tier matching
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Ensemble decision engine combines multiple matching tiers (range match, Build-ID, fingerprint) with configurable weight tuning for vulnerability classification.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/`
- **Key Classes**:
- `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs`) - combines multiple matching signals with configurable weights into a final vulnerability classification decision
- `FunctionAnalysisBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs`) - builds function analysis inputs including optional ML embeddings
- `WeightTuningService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/WeightTuningService.cs`) - tunes ensemble weights based on golden set validation results
- `EnsembleOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/Models.cs`) - configurable weights and thresholds for matching tiers
- `MlEmbeddingMatcherAdapter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/MlEmbeddingMatcherAdapter.cs`) - adapts ML function embeddings for ensemble use
- **Interfaces**: `IEnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/IEnsembleDecisionEngine.cs`)
- **Registration**: `EnsembleServiceCollectionExtensions.AddBinarySimilarityServices()` for full pipeline setup
- **Benchmarks**: `EnsembleAccuracyBenchmarks`, `EnsembleLatencyBenchmarks` (`src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Benchmarks/`)
## E2E Test Plan
- [ ] Submit a binary with known vulnerability and verify ensemble produces correct classification
- [ ] Verify weight tuning: adjust instruction weight to 0.6 and verify it changes classification outcomes
- [ ] Verify multi-tier integration: Build-ID match, fingerprint match, and ML embedding all contribute to score
- [ ] Verify `FunctionAnalysisBuilder` correctly assembles all matching dimensions
- [ ] Verify `WeightTuningService` optimizes weights based on golden set validation accuracy
- [ ] Run accuracy benchmark and verify F1 score meets minimum threshold

View File

@@ -0,0 +1,29 @@
# Function-Range Hashing and Symbol Mapping
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Multi-backend disassembly (Iced, B2R2) with function-range normalization for symbol-level binary proof.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`
- **Key Classes**:
- `IFunctionFingerprintExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/IFunctionFingerprintExtractor.cs`) - extracts function-range fingerprints from disassembled binaries
- `FunctionDiffer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionDiffer.cs`) - compares function fingerprints with semantic analysis support; computes call-graph edge diffs
- `FunctionRenameDetector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionRenameDetector.cs`) - detects renamed functions by comparing fingerprint similarity
- `PatchDiffEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/PatchDiffEngine.cs`) - builder-level patch diff engine
- `FingerprintClaimModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/FingerprintClaimModels.cs`) - `FingerprintClaim` and `FingerprintClaimEvidence` records
- **Models**: `FingerprintModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/FingerprintModels.cs`) - `FunctionFingerprint` with hash, size, call edges
- **Disassembly Backends**: `B2R2DisassemblyPlugin` (ARM/x86/x64/AArch64), `IcedDisassemblyPlugin` (x86/x64)
## E2E Test Plan
- [ ] Extract function fingerprints from an ELF binary and verify hash consistency for identical functions
- [ ] Verify function-range normalization produces same hash across compiler optimization levels when function logic is identical
- [ ] Verify `FunctionDiffer` correctly identifies added, removed, and modified functions
- [ ] Verify `FunctionRenameDetector` matches renamed functions based on fingerprint similarity threshold
- [ ] Verify `FingerprintClaim` evidence links correctly to Build-ID and function IDs
- [ ] Verify multi-backend consistency: same binary produces matching fingerprints via B2R2 and Iced

View File

@@ -0,0 +1,29 @@
# Golden Corpus Bundle Export/Import Service
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Import/export services for golden corpus bundles with standalone verification support, enabling offline corpus distribution and validation. The known list has "Offline Corpus Bundle Export/Import" but this provides reproducible bundle management with trust-profile-aware verification specific to the golden corpus.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`
- **Key Classes**:
- `ServiceCollectionExtensions.AddCorpusBundleExport()` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ServiceCollectionExtensions.cs`) - registers export services
- `ServiceCollectionExtensions.AddCorpusBundleImport()` (same file) - registers import services
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - uses imported bundles for validation runs
- `GroundTruthCorpusBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/GroundTruthCorpusBuilder.cs`) - builds training corpus with export support in JsonLines and Json formats
- **Interfaces**: `ICorpusBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/ICorpusBuilder.cs`) - `ExportAsync()` with format selection
- **Export Formats**: `CorpusExportFormat.JsonLines`, `CorpusExportFormat.Json`
- **Source**: SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli.md
## E2E Test Plan
- [ ] Export a golden corpus bundle and verify the output file contains all function fingerprints and metadata
- [ ] Import the exported bundle and verify all entries are restored correctly
- [ ] Verify round-trip: export then import and verify validation results match
- [ ] Verify JsonLines export format produces one record per line
- [ ] Verify Json export format produces a single valid JSON document
- [ ] Verify offline verification works with imported bundles without network access

View File

@@ -0,0 +1,27 @@
# Golden Corpus KPI Regression Service
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
KPI regression tracking service for golden corpus validation, including SBOM hash stability validation, regression detection across corpus runs, and automated KPI reporting. The known list has "Golden Corpus" and "Golden Set" entries but not a dedicated KPI regression service for tracking validation quality over time.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/`
- **Key Classes**:
- `KpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/KpiRegressionService.cs`) - detects accuracy regressions across validation runs by comparing KPI metrics over time; uses `TimeProvider` for testable timestamps
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - produces validation run results consumed by KPI regression tracking
- **Interfaces**: `IKpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/IKpiRegressionService.cs`)
- **Validation Metrics**: precision, recall, F1 score, false positive rate tracked per validation run
- **Source**: SPRINT_20260121_034_BinaryIndex_golden_corpus_foundation.md
## E2E Test Plan
- [ ] Run two validation passes with different accuracy and verify `KpiRegressionService` detects the regression
- [ ] Verify KPI metrics (precision, recall, F1) are computed correctly from validation run results
- [ ] Verify no regression is reported when accuracy improves between runs
- [ ] Verify SBOM hash stability check flags unstable hash generation
- [ ] Verify regression alerts include the specific metrics that degraded
- [ ] Verify `TimeProvider` injection allows deterministic testing of time-based regression windows

View File

@@ -0,0 +1,29 @@
# Golden Corpus Validation Harness
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Validation harness infrastructure for running golden corpus tests against binary index results, comparing expected vs actual outcomes. While "Validation Harness and Reproducibility Verification" is in the known list, this is a distinct BinaryIndex-specific validation harness with its own abstraction layer.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/`
- **Key Classes**:
- `ValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/ValidationHarness.cs`) - main harness with `IMatcherAdapterFactory` integration for pluggable matching strategies
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - orchestrates reproducible-build validation runs with `ValidationRunContext`
- `CallGraphMatcherAdapter` and other `MatcherAdapters` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/Matchers/MatcherAdapters.cs`) - adapters for different matching strategies
- **Interfaces**: `IValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/IValidationHarness.cs`)
- **Models**: `ValidationRun` with `CorpusSnapshotId` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/ValidationRun.cs`)
- **Registration**: `ValidationServiceCollectionExtensions.AddValidationHarness()` and `AddCorpusBundleExport/Import`
- **Source**: SPRINT_20260121_034_BinaryIndex_golden_corpus_foundation.md
## E2E Test Plan
- [ ] Run validation harness against a golden set and verify expected vs actual outcomes are compared
- [ ] Verify pluggable matcher adapters: run with call-graph matcher and verify correct results
- [ ] Verify validation run produces a `ValidationRun` with correct `CorpusSnapshotId`
- [ ] Verify validation attestor generates valid attestation predicates from validation run results
- [ ] Verify report generator produces deterministic reports from validation runs
- [ ] Verify validation results feed into KPI regression service for tracking

View File

@@ -0,0 +1,31 @@
# Golden Set for Patch Validation (in BinaryIndex)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Golden set analysis pipeline and API controller for curated binary patch validation test cases.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`, `src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/`
- **Key Classes**:
- `GoldenSetAnalysisPipeline` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/GoldenSetAnalysisPipeline.cs`) - runs validation analysis against golden set definitions
- `GoldenSetController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/GoldenSetController.cs`) - REST API for golden set CRUD operations with filtering, pagination, and ordering
- `POST /api/v1/golden-sets` - create golden set definitions
- `GET /api/v1/golden-sets` - list with status/component/tag filters
- `GET /api/v1/golden-sets/{id}` - get by ID
- `GoldenSetValidator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Validation/GoldenSetValidator.cs`) - validates golden set definitions
- **Interfaces**: `IGoldenSetStore`, `IGoldenSetValidator`
- **Models**: `GoldenSetDefinition`, `GoldenSetListQuery`, `GoldenSetListResponse`, `GoldenSetCreateRequest/Response`
- **Enums**: `GoldenSetStatus` (Draft, Active, etc.), `GoldenSetOrderBy`
## E2E Test Plan
- [ ] Create a golden set via `POST /api/v1/golden-sets` and verify it is stored with Draft status
- [ ] List golden sets with component filter and verify only matching sets are returned
- [ ] Get golden set by ID and verify all fields including metadata are returned
- [ ] Run golden set analysis pipeline against a known binary pair and verify patch validation result
- [ ] Verify golden set validation rejects definitions with invalid CVE references
- [ ] Verify pagination and ordering work correctly with multiple golden sets

View File

@@ -0,0 +1,33 @@
# Golden Set Schema and Management
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Full golden set management library with authoring, configuration, serialization, storage, validation, and migration support.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/`
- **Key Classes**:
- **Authoring**: `GoldenSetExtractor`, `GoldenSetEnrichmentService`, `GoldenSetReviewService`, `UpstreamCommitAnalyzer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Authoring/`)
- **Source Extractors**: `NvdGoldenSetExtractor`, `FunctionHintExtractor`, `CweToSinkMapper` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Authoring/Extractors/`)
- **Configuration**: `GoldenSetOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Configuration/`)
- **Models**: `GoldenSetDefinition` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Models/`)
- **Serialization**: `GoldenSetYamlSerializer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Serialization/`)
- **Storage**: `PostgresGoldenSetStore` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Storage/`), `IGoldenSetStore`
- **Validation**: `GoldenSetValidator`, `ICveValidator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Validation/`)
- **Services**: `SinkRegistry`, `ISinkRegistry` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Services/`)
- **Registration**: `GoldenSetServiceCollectionExtensions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GoldenSet/Extensions/`)
## E2E Test Plan
- [ ] Author a golden set from NVD data via `NvdGoldenSetExtractor` and verify extracted CVE entries
- [ ] Enrich golden set with function hints via `FunctionHintExtractor` and verify hint annotations
- [ ] Map CWEs to sink functions via `CweToSinkMapper` and verify correct mappings
- [ ] Serialize golden set to YAML via `GoldenSetYamlSerializer` and verify round-trip fidelity
- [ ] Store golden set in PostgreSQL via `PostgresGoldenSetStore` and verify retrieval
- [ ] Validate golden set definition via `GoldenSetValidator` and verify errors for invalid entries
- [ ] Verify `SinkRegistry` maintains the sink function catalog
- [ ] Verify review workflow via `GoldenSetReviewService` transitions (Draft -> Review -> Approved)

View File

@@ -0,0 +1,30 @@
# Ground-Truth Corpus Infrastructure (Symbol Source Abstractions)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Abstraction layer for symbol source connectors, validation harness, KPI computation, and security pair tracking for the ground-truth corpus infrastructure.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/`
- **Key Classes**:
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - orchestrates ground truth validation with `ValidationRunContext`
- `KpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/KpiRegressionService.cs`) - KPI computation and regression tracking
- `GroundTruthProvenanceResolver` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Provenance/GroundTruthProvenanceResolver.cs`) - resolves symbol provenance from ground truth data
- `GroundTruthCorpusBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/GroundTruthCorpusBuilder.cs`) - builds training corpus from ground truth pairs
- Corpus connectors: `AlpineCorpusConnector`, `DebianCorpusConnector`, `RpmCorpusConnector` - distro-specific symbol sources
- Library connectors: `CurlCorpusConnector`, `GlibcCorpusConnector`, `OpenSslCorpusConnector`, `ZlibCorpusConnector`
- **Interfaces**: `IBinaryCorpusConnector`, `ILibraryCorpusConnector`, `ICorpusSnapshotRepository`, `ISymbolProvenanceResolver`, `IKpiRegressionService`
- **Registration**: `ServiceCollectionExtensions` with `AddCorpusBundleExport/Import` methods
## E2E Test Plan
- [ ] Connect to a corpus source via library connector and verify binary extraction works
- [ ] Resolve symbol provenance for a known function via `GroundTruthProvenanceResolver`
- [ ] Build a ground truth corpus for ML training via `GroundTruthCorpusBuilder`
- [ ] Track KPI metrics across multiple validation runs and verify regression detection
- [ ] Verify corpus snapshot repository persists and retrieves snapshots with correct IDs
- [ ] Verify security pair tracking (vulnerable/fixed binary pairs) across corpus connectors

View File

@@ -0,0 +1,28 @@
# Known-build binary catalog (Build-ID + hash-based binary identity)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
BinaryIdentity model and vulnerability assertion repository implement the binary-key-based catalog using Build-ID and file SHA256 as primary keys.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`
- **Key Classes**:
- `BinaryIdentity` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Models/BinaryIdentity.cs`) - core model with Build-ID, file SHA256, symbol tables as primary keys
- `BinaryIdentityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/BinaryIdentityService.cs`) - manages binary identity lifecycle
- `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - vulnerability assertion repository with Build-ID catalog lookups and match method mapping (buildid_catalog, delta_signature, etc.)
- `CachedBinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/CachedBinaryVulnerabilityService.cs`) - cached decorator with `LookupByDeltaSignatureAsync`
- **Interfaces**: `IBinaryVulnerabilityService`, `IBinaryVulnAssertionRepository` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/`)
- **Models**: `FixModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Models/`) - `FixState`, `FixStatusResult`, `MatchMethod`, `MatchEvidence`
## E2E Test Plan
- [ ] Register a binary identity with known Build-ID and verify it is stored in the catalog
- [ ] Query the catalog by Build-ID and verify the correct binary identity is returned
- [ ] Query by file SHA256 hash and verify the correct binary identity is returned
- [ ] Assert a vulnerability against a binary identity and verify the assertion is persisted
- [ ] Verify `CachedBinaryVulnerabilityService` caches lookups and returns cached results on repeat queries
- [ ] Verify match method mapping: `buildid_catalog` maps to `MatchMethod.BuildIdCatalog`

View File

@@ -0,0 +1,28 @@
# Local Mirror Layer for Corpus Sources
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Local mirror service for caching and serving corpus data from remote sources, supporting offline operation.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/`
- **Key Classes**:
- `DebianMirrorPackageSource` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/DebianMirrorPackageSource.cs`) - mirrors Debian package repositories for offline access
- `DebianCorpusConnector` with `ICorpusSnapshotRepository` - creates snapshots of remote corpus state for local use
- `AlpineCorpusConnector` with snapshot support - caches Alpine APK package data locally
- `RpmCorpusConnector` - caches RPM package data for offline operation
- `ICorpusSnapshotRepository` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/ICorpusSnapshotRepository.cs`) - persists corpus snapshots for offline retrieval
- **Interfaces**: `IDebianPackageSource`, `IAlpinePackageSource`, `IRpmPackageSource` - distro-specific package source abstractions
## E2E Test Plan
- [ ] Fetch packages from Debian mirror source and verify local cache is populated
- [ ] Disconnect network and verify cached corpus data is still accessible
- [ ] Create a corpus snapshot and verify it captures the complete state of remote data
- [ ] Verify Alpine APK packages are cached locally via `AlpineCorpusConnector`
- [ ] Verify RPM packages are cached locally via `RpmCorpusConnector`
- [ ] Verify snapshot-based queries return consistent results when the remote source changes

View File

@@ -0,0 +1,30 @@
# ML Function Embedding Service (CodeBERT/ONNX Inference)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
ONNX-based function embedding inference service for binary function matching using CodeBERT-derived models. Includes training corpus schema, embedding generation pipeline, and ensemble integration with existing matchers. No direct match in known features list.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/`
- **Key Classes**:
- `IEmbeddingService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/IEmbeddingService.cs`) - generates `FunctionEmbedding` from binary functions; supports batch generation, similarity computation, and nearest-neighbor search
- `InMemoryEmbeddingIndex` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/InMemoryEmbeddingIndex.cs`) - in-memory vector index for fast embedding similarity search with cosine similarity
- `MlEmbeddingMatcherAdapter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/MlEmbeddingMatcherAdapter.cs`) - adapts ML embeddings for ensemble decision engine
- `GroundTruthCorpusBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/GroundTruthCorpusBuilder.cs`) - builds training corpus from ground truth data with JsonLines/Json export
- `ICorpusBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/ICorpusBuilder.cs`) - training corpus building interface with `CorpusExportFormat` enum
- `FunctionEmbedding` - vector embedding record for binary functions
- **Integration**: `FunctionAnalysisBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs`) passes ML embeddings into ensemble scoring
- **Registration**: `TrainingServiceCollectionExtensions` for DI setup
## E2E Test Plan
- [ ] Generate a function embedding from a known binary function and verify vector dimensions are correct
- [ ] Compute similarity between embeddings of identical functions (compiled with different flags) and verify high similarity
- [ ] Add embeddings to `InMemoryEmbeddingIndex` and verify nearest-neighbor search returns correct matches
- [ ] Build a training corpus from ground truth pairs via `GroundTruthCorpusBuilder`
- [ ] Verify `MlEmbeddingMatcherAdapter` integrates with ensemble decision engine
- [ ] Verify batch embedding generation processes multiple functions efficiently

View File

@@ -0,0 +1,25 @@
# Patch Coverage Tracking
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Dedicated patch coverage API endpoint for tracking which CVE patches are covered in binary analysis.
## Implementation Details
- **Modules**: `src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/`
- **Key Classes**:
- `PatchCoverageController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/PatchCoverageController.cs`) - REST API controller for patch coverage queries using `IDeltaSignatureRepository`
- `DeltaSignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureMatcher.cs`) - matches delta signatures to assess patch coverage
- `DeltaSigService` / `DeltaSigServiceV2` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`) - service layer for delta-sig operations
- **Interfaces**: `IDeltaSignatureRepository` - repository for persisted delta signatures used by patch coverage queries
## E2E Test Plan
- [ ] Query patch coverage API for a known CVE and verify coverage status (covered/not covered)
- [ ] Verify patch coverage percentage calculation: submit binaries with partial patch coverage
- [ ] Verify that delta signatures for the CVE fix are used to determine coverage
- [ ] Verify API returns correct coverage for batch queries across multiple CVEs
- [ ] Verify coverage tracking updates when new delta signatures are added

View File

@@ -0,0 +1,31 @@
# PatchDiffEngine (Binary Pre/Post Patch Comparison for Fix Verification)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Compares pre-patch and post-patch binaries at multiple levels (BasicBlock, CFG, StringRefs, Semantic/KSG fingerprints) to determine if a vulnerability has been remediated. Produces structured verification results with confidence scores based on match depth. Core verification logic for the Golden Set Diff Layer.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/`
- **Key Classes**:
- `PatchDiffEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/PatchDiffEngine.cs`) - core engine comparing pre/post binaries using `ISignatureMatcher`, `IFunctionFingerprintExtractor`, and `IFunctionDiffer`; produces `PatchDiffResult` with confidence scores
- `PatchDiffEngine` (builders) (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/PatchDiffEngine.cs`) - builder-level diff engine
- `FunctionDiffer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionDiffer.cs`) - function-level comparison with semantic analysis, call-graph edge diffing, and string reference comparison
- `FunctionRenameDetector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionRenameDetector.cs`) - detects renamed functions between versions
- `VerdictCalculator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/VerdictCalculator.cs`) - computes fix verification verdict from diff results
- **Models**: `PatchDiffResult`, `PatchDiffModels`, `DiffEvidenceModels`, `DiffOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Models/`)
- **Storage**: `IDiffResultStore`, `InMemoryDiffResultStore` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Storage/`)
- **Source**: SPRINT_20260110_012_004_BINDEX_golden_set_diff_verify.md
## E2E Test Plan
- [ ] Submit pre-patch and post-patch binaries for a known CVE fix and verify the diff result shows patch applied
- [ ] Verify multi-level comparison: BasicBlock, CFG, StringRefs, and semantic fingerprints all contribute to confidence
- [ ] Verify `FunctionDiffer` with `IncludeSemanticAnalysis=true` computes semantic similarity
- [ ] Verify `FunctionRenameDetector` handles renamed functions between versions
- [ ] Verify `VerdictCalculator` produces correct verdict (Fixed, Vulnerable, Unknown) based on diff evidence
- [ ] Verify `NoPatchDetected` result is returned when binaries are identical
- [ ] Verify diff results are persistable via `IDiffResultStore` with content-addressed IDs

View File

@@ -0,0 +1,28 @@
# Reproducible build verification
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Reproducible build backend supports local rebuilds with air-gap bundle support for verifying binary provenance.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/`, `src/BinaryIndex/StellaOps.BinaryIndex.Worker/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`
- **Key Classes**:
- `ReproducibleBuildJob` (`src/BinaryIndex/StellaOps.BinaryIndex.Worker/Jobs/ReproducibleBuildJob.cs`) - worker job that executes reproducible builds using `IFunctionFingerprintExtractor`, `IPatchDiffEngine`, and `IFingerprintClaimRepository`
- `ReproducibleBuildJob` (builders) (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/ReproducibleBuildJobTypes.cs`) - builder-level reproducible build job with options
- `ReproducibleBuildOptions` - configuration for build verification parameters
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - validates reproducible build outputs
- `FingerprintClaim` / `FingerprintClaimEvidence` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/FingerprintClaimModels.cs`) - claims produced from build verification
- **Interfaces**: `IReproducibleBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/IReproducibleBuilder.cs`), `IReproducibleBuildJob`
## E2E Test Plan
- [ ] Submit a source package and verify reproducible build produces matching binary fingerprints
- [ ] Verify `FingerprintClaim` is generated with correct `FingerprintClaimEvidence` linking to Build-ID
- [ ] Verify build verification with non-matching binaries produces a failed verification result
- [ ] Verify air-gap bundle support: import build inputs from bundle and verify build completes offline
- [ ] Verify `ReproducibleBuildOptions` configuration controls build behavior
- [ ] Verify build job integrates with `IPatchDiffEngine` for post-build comparison

View File

@@ -0,0 +1,29 @@
# Reproducible Distro Build Pipeline (Container-Based Builders)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Container-based reproducible build pipeline for Alpine, Debian, and RHEL packages. Rebuilds upstream source packages in isolated containers to produce reference binaries for function-level fingerprint comparison, enabling backport detection by comparing distro-patched binaries against unpatched originals.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/`, `src/BinaryIndex/StellaOps.BinaryIndex.Worker/`
- **Key Classes**:
- `ReproducibleBuildJob` (`src/BinaryIndex/StellaOps.BinaryIndex.Worker/Jobs/ReproducibleBuildJob.cs`) - background worker job using `IFunctionFingerprintExtractor` and `IPatchDiffEngine` to rebuild packages and compare fingerprints
- `ReproducibleBuildOptions` - build configuration (timeout, container images, source package locations)
- `IReproducibleBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/IReproducibleBuilder.cs`) - abstraction for container-based builds
- `BuilderOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/BuilderOptions.cs`) - builder configuration
- `GuidProvider` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/GuidProvider.cs`) - deterministic GUID generation for reproducibility
- **Integration**: Uses `IFingerprintClaimRepository` to store build verification claims; integrates with `IPatchDiffEngine` for post-build binary comparison
- **Source**: SPRINT_1227_0002_0001_LB_reproducible_builders.md
## E2E Test Plan
- [ ] Trigger a reproducible build for a Debian package and verify reference binaries are produced
- [ ] Compare distro-patched binary against unpatched original and verify fingerprint differences
- [ ] Verify container isolation: build runs in isolated container with controlled environment
- [ ] Verify `FingerprintClaim` records are generated with build provenance evidence
- [ ] Verify `GuidProvider` produces deterministic GUIDs for identical build inputs
- [ ] Verify backport detection: distro-patched binary with backported fix is correctly identified

View File

@@ -0,0 +1,27 @@
# SBOM Bom-Ref Linkage in Binary Function Identity
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Extended function identity model (SymbolSignatureV2) with SBOM bom-ref linkage following the format `module:bom-ref:offset:canonical-IR-hash`. Includes IBomRefResolver interface for resolving binary artifacts to SBOM component references with graceful fallback.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- **Key Classes**:
- `DeltaSigPredicateV2` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Attestation/DeltaSigPredicateV2.cs`) - V2 predicate including SBOM bom-ref linkage in function identity records
- `DeltaSigVexBridge` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/VexIntegration/DeltaSigVexBridge.cs`) - VEX bridge uses symbol provenance (which includes SBOM refs) to enrich VEX observations
- `GroundTruthProvenanceResolver` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Provenance/GroundTruthProvenanceResolver.cs`) - enriches function matches with `SymbolProvenance` including source references
- `Models.cs` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Models.cs`) - `SymbolMatchResult` with `SymbolProvenance` property for bom-ref linkage
- **Interfaces**: `ISymbolProvenanceResolver` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Provenance/ISymbolProvenanceResolver.cs`) - resolves `SymbolProvenanceV2` with batch lookup support
- **Source**: SPRINT_20260118_026_BinaryIndex_deltasig_enhancements.md
## E2E Test Plan
- [ ] Resolve a binary function to its SBOM bom-ref via `ISymbolProvenanceResolver` and verify the linkage format
- [ ] Verify `DeltaSigPredicateV2` includes bom-ref linkage in function identity records
- [ ] Verify `DeltaSigVexBridge` includes provenance source from SBOM in VEX observations
- [ ] Verify batch lookup via `BatchLookupAsync` resolves multiple symbols efficiently
- [ ] Verify graceful fallback when SBOM bom-ref is not available (function identity still works without it)

View File

@@ -0,0 +1,28 @@
# Scanner Integration for Binary Analysis
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Binary vulnerability analysis integrated into the scanner worker pipeline with patch verification and build provenance reproducibility verification.
## Implementation Details
- **Modules**: `src/BinaryIndex/`, `src/Scanner/`
- **Key Classes**:
- `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - core binary vulnerability detection service used by scanner pipeline; queries `ICorpusQueryService` for function matches
- `CachedBinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/CachedBinaryVulnerabilityService.cs`) - cached decorator with `LookupByDeltaSignatureAsync` for scanner integration
- `ResolutionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Resolution/ResolutionService.cs`) - resolves whether a CVE is fixed based on binary-level evidence
- `ReproducibleBuildJob` (`src/BinaryIndex/StellaOps.BinaryIndex.Worker/Jobs/ReproducibleBuildJob.cs`) - worker job for build provenance verification
- `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs`) - multi-tier matching for scanner-detected vulnerabilities
- **Integration Points**: Scanner pipeline calls `IBinaryVulnerabilityService` to enrich findings with binary-level patch verification
## E2E Test Plan
- [ ] Trigger a scanner scan on a container with known binaries and verify binary analysis runs automatically
- [ ] Verify scanner findings are enriched with binary-level patch status (Fixed, Vulnerable, Unknown)
- [ ] Verify `CachedBinaryVulnerabilityService` caches scanner lookups for performance
- [ ] Verify build provenance verification runs as a background worker job
- [ ] Verify ensemble decision engine produces consistent results when called from scanner pipeline
- [ ] Verify binary analysis results are included in scanner output findings

View File

@@ -0,0 +1,31 @@
# Semantic Analysis Library (IR Lifting and Function Fingerprinting)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Semantic binary analysis with IR lifting, function fingerprint generation, semantic matching, graph extraction, and call n-gram generation for function-level binary comparison.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
- **Key Classes**:
- `IrLiftingService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/IrLiftingService.cs`) - lifts machine code to intermediate representation using B2R2
- `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - generates `SemanticFingerprint` using Weisfeiler-Lehman graph hashing (KsgWeisfeilerLehmanV1 algorithm)
- `SemanticGraphExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticGraphExtractor.cs`) - extracts key-semantics graphs (KSG) from lifted IR
- `SemanticMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticMatcher.cs`) - matches semantic fingerprints for similarity scoring
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - call-sequence n-gram fingerprinting
- `WeisfeilerLehmanHasher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/WeisfeilerLehmanHasher.cs`) - WL graph hash implementation
- `GraphCanonicalizer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/GraphCanonicalizer.cs`) - graph canonicalization for deterministic hashing
- **Models**: `FingerprintModels` (SemanticFingerprint, SemanticFingerprintOptions, SemanticFingerprintAlgorithm), `GraphModels` (KeySemanticsGraph), `IrModels` (LiftedFunction, IrStatement)
- **Interfaces**: `IIrLiftingService`, `ISemanticFingerprintGenerator`, `ISemanticGraphExtractor`, `ISemanticMatcher`
## E2E Test Plan
- [ ] Lift a binary function to IR via `IrLiftingService` and verify IR structure contains valid statements
- [ ] Generate a semantic fingerprint via `SemanticFingerprintGenerator` and verify hash is deterministic
- [ ] Extract a key-semantics graph via `SemanticGraphExtractor` and verify node/edge structure
- [ ] Match two fingerprints of the same function (different compilers) via `SemanticMatcher` and verify high similarity
- [ ] Verify Weisfeiler-Lehman graph hash produces different hashes for structurally different functions
- [ ] Verify `GraphCanonicalizer` produces consistent canonical forms for isomorphic graphs

View File

@@ -0,0 +1,29 @@
# Static-to-Binary Braid (Build-Time Function Proof)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Full binary analysis pipeline with function fingerprinting, delta signatures, multi-backend disassembly (Iced, B2R2), normalization, and semantic analysis for build-time function proof.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/`
- **Key Classes**:
- `PatchDiffEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/PatchDiffEngine.cs`) - orchestrates build-time function proof by comparing pre/post binaries
- `DeltaSigServiceV2` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSigServiceV2.cs`) - V2 delta-sig with IR diff support
- `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - semantic function fingerprinting
- `HybridDisassemblyService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly/HybridDisassemblyService.cs`) - multi-backend disassembly
- `CodeNormalizer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Decompiler/CodeNormalizer.cs`) - normalizes decompiled code for comparison
- `SemanticEquivalence` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Decompiler/SemanticEquivalence.cs`) - semantic equivalence checking between code versions
- `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs`) - combines all matching tiers for final proof verdict
## E2E Test Plan
- [ ] Submit source-to-binary pair and verify function-level proof linking source functions to binary symbols
- [ ] Verify multi-backend disassembly: same binary analyzed by Iced and B2R2 produces compatible fingerprints
- [ ] Verify delta-sig generation creates build-time proof of which functions changed
- [ ] Verify semantic analysis identifies equivalent functions across different compiler outputs
- [ ] Verify code normalization strips compiler-specific artifacts for fair comparison
- [ ] Verify ensemble decision produces final proof verdict combining all evidence tiers

View File

@@ -0,0 +1,30 @@
# Symbol Change Tracking in Binary Diffs (SymbolChangeTracer)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Extends BinaryIndex DeltaSignature module to track which specific symbols changed between binary versions (not just whether they match). Adds change metadata to SymbolMatchResult and provides detailed CFG hash and instruction hash comparison for symbol-level binary change forensics.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- **Key Classes**:
- `SymbolChangeTracer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/SymbolChangeTracer.cs`) - traces symbol-level changes between binary versions with detailed CFG hash and instruction hash comparison
- `DeltaSignatureGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureGenerator.cs`) - generates delta signatures capturing symbol change metadata
- `DeltaSignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureMatcher.cs`) - matches signatures with change tracking awareness
- `CfgExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/CfgExtractor.cs`) - extracts CFG for hash comparison
- `IrDiffGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/IrDiff/IrDiffGenerator.cs`) - generates IR-level diffs for detailed change analysis
- **Interfaces**: `ISymbolChangeTracer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/ISymbolChangeTracer.cs`)
- **Models**: `SymbolMatchResult` with change metadata in `Models.cs`
- **Source**: SPRINT_20260112_200_003_BINDEX_symbol_tracking.md
## E2E Test Plan
- [ ] Compare two binary versions with known symbol changes and verify `SymbolChangeTracer` identifies which symbols changed
- [ ] Verify CFG hash comparison detects control flow changes in modified functions
- [ ] Verify instruction hash comparison detects instruction-level changes
- [ ] Verify `SymbolMatchResult` includes change metadata (added, removed, modified symbols)
- [ ] Verify IR-level diff captures semantic changes beyond byte-level differences
- [ ] Verify unchanged symbols are correctly identified as stable between versions

View File

@@ -0,0 +1,28 @@
# Symbol Source Connectors (Debuginfod, Buildinfo, Ddeb, SecDb)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Four symbol source connector implementations (Debuginfod, Debian Buildinfo, Ubuntu Ddeb, Alpine SecDb), each with plugin registration and configuration support.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/`
- **Key Classes**:
- **Alpine SecDb**: `AlpineCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/AlpineCorpusConnector.cs`) - connects to Alpine security database; `ApkBuildSecfixesExtractor` - extracts secfixes from APK build files
- **Debian Buildinfo**: `DebianCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/DebianCorpusConnector.cs`) - connects to Debian buildinfo sources; `DebianMirrorPackageSource` - mirrors Debian repositories
- **RPM**: `RpmCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/RpmCorpusConnector.cs`) - connects to RPM repositories; `SrpmChangelogExtractor` - extracts changelogs from source RPMs
- **Library-specific**: `CurlCorpusConnector`, `GlibcCorpusConnector`, `OpenSslCorpusConnector`, `ZlibCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Connectors/`)
- **Interfaces**: `IBinaryCorpusConnector`, `ILibraryCorpusConnector`, `IAlpinePackageSource`, `IDebianPackageSource`, `IRpmPackageSource`
- **Package Extractors**: `AlpinePackageExtractor`, `DebianPackageExtractor`, `RpmPackageExtractor` - extract binaries from packages using `IBinaryFeatureExtractor`
## E2E Test Plan
- [ ] Connect via `AlpineCorpusConnector` and verify secfixes data is extracted from APK builds
- [ ] Connect via `DebianCorpusConnector` and verify buildinfo data is retrieved from Debian mirrors
- [ ] Connect via `RpmCorpusConnector` and verify RPM changelog extraction works
- [ ] Verify library-specific connectors (OpenSSL, glibc, curl, zlib) retrieve correct binary versions
- [ ] Verify all connectors produce `CorpusSnapshot` with consistent snapshot IDs
- [ ] Verify package extractors use `IBinaryFeatureExtractor` to extract identity features from packages

View File

@@ -0,0 +1,29 @@
# Validation Harness and Reproducibility Verification
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Validation harness with determinism validation, SBOM stability checking, and reproducible build verification. Includes local rebuild backend and bundle export/import.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/`
- **Key Classes**:
- `ValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/ValidationHarness.cs`) - main validation harness with `IMatcherAdapterFactory` for pluggable matching
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - reproducible-build validation with `ValidationRunContext`
- `ReproducibleBuildJob` (`src/BinaryIndex/StellaOps.BinaryIndex.Worker/Jobs/ReproducibleBuildJob.cs`) - local rebuild backend
- `KpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/KpiRegressionService.cs`) - SBOM stability and KPI regression tracking
- **Bundle Export/Import**: `ServiceCollectionExtensions.AddCorpusBundleExport/Import` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ServiceCollectionExtensions.cs`)
- **Interfaces**: `IValidationHarness`, `IKpiRegressionService`, `IReproducibleBuildJob`
- **Registration**: `ValidationServiceCollectionExtensions.AddValidationHarness()`
## E2E Test Plan
- [ ] Run validation harness and verify deterministic results for identical inputs
- [ ] Verify SBOM stability checking detects unstable hash generation
- [ ] Verify reproducible build verification: rebuild from source and compare against original binary
- [ ] Verify bundle export produces a self-contained archive importable on air-gapped systems
- [ ] Verify bundle import restores corpus data and enables offline validation
- [ ] Verify KPI regression tracking across multiple validation harness runs

View File

@@ -0,0 +1,29 @@
# Vulnerable Binaries Database (BinaryIndex Module)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Dedicated BinaryIndex module with web service, worker, and library structure for binary vulnerability detection independent of package metadata.
## Implementation Details
- **Modules**: `src/BinaryIndex/StellaOps.BinaryIndex.WebService/`, `src/BinaryIndex/StellaOps.BinaryIndex.Worker/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`
- **Key Classes**:
- **Web Service**: `ResolutionController` (`Controllers/ResolutionController.cs`) - vulnerability resolution API; `GoldenSetController` - golden set management API; `PatchCoverageController` - patch coverage API; `BinaryIndexOpsController` - ops health/bench/cache endpoints
- **Worker**: `ReproducibleBuildJob` (`Jobs/ReproducibleBuildJob.cs`) - background worker for build verification
- **Persistence**: `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - vulnerability detection service with match method mapping and corpus query integration
- **Cache**: `CachedBinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/CachedBinaryVulnerabilityService.cs`) - Valkey-backed caching layer
- **Analysis**: `SignatureMatcher`, `TaintGateExtractor`, `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`)
- **Ensemble**: `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/`) - multi-tier vulnerability classification
- **Program Entry**: `Program.cs` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Program.cs`) - configures services, resolution caching, rate limiting
## E2E Test Plan
- [ ] Query the database for a known vulnerable binary (by Build-ID) and verify vulnerability is detected
- [ ] Submit a binary for analysis and verify detection works independent of package metadata
- [ ] Verify web service endpoints are accessible: resolution, golden set, patch coverage, ops
- [ ] Verify worker job processes reproducible build verification in the background
- [ ] Verify cached lookups improve performance on repeated queries
- [ ] Verify ensemble decision engine combines all matching signals for final vulnerability classification

View File

@@ -0,0 +1,30 @@
# Vulnerable Code Fingerprint Matching (CFG + Basic Block + String Refs Ensemble)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Function-level vulnerability detection independent of package metadata using an ensemble of fingerprint algorithms: basic block hashing, control flow graph fingerprinting, and string reference fingerprinting. Combined generator provides multi-algorithm similarity matching with configurable thresholds. Includes pre-seeded fingerprints for high-impact CVEs in OpenSSL, glibc, zlib, and curl.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`
- **Key Classes**:
- `SignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs`) - matches vulnerability signatures using fingerprint index
- `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs`) - combines CFG, basic block, string ref, and ML embedding fingerprints with configurable weights
- `FunctionAnalysisBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs`) - assembles multi-algorithm fingerprint inputs
- `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - KSG-based semantic fingerprinting
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - call-sequence fingerprinting
- `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - vulnerability lookup with pre-seeded fingerprints
- **Models**: `SignatureIndexModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/`) - fingerprint index models
- **Source**: SPRINT_20251226_013_BINIDX_fingerprint_factory.md
## E2E Test Plan
- [ ] Match a known vulnerable function (e.g., OpenSSL Heartbleed) against pre-seeded fingerprints and verify detection
- [ ] Verify multi-algorithm ensemble: CFG fingerprint + basic block hash + string refs all contribute to match score
- [ ] Verify configurable threshold: adjust threshold to 0.8 and verify borderline matches are excluded
- [ ] Verify pre-seeded fingerprints exist for high-impact CVEs (OpenSSL, glibc, zlib, curl)
- [ ] Verify false positive rate: submit clean binary functions and verify no false matches
- [ ] Verify `EnsembleDecisionEngine` weight tuning affects match outcomes