save checkpoint: save features

This commit is contained in:
master
2026-02-12 10:27:23 +02:00
parent dca86e1248
commit 5bca406787
8837 changed files with 1796879 additions and 5294 deletions

View File

@@ -0,0 +1,37 @@
# Binary Call-Graph Extraction and Reachability Analysis
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Binary call-graph extraction with BinaryCallGraphExtractor, reachability lifting via BinaryReachabilityLifter, dedicated BinaryIndex analysis module, and CLI binary commands.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- **Key Classes**:
- `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/ReachGraphBinaryReachabilityService.cs`) - binary-level reachability integration with ReachGraph
- `TaintGateExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/TaintGateExtractor.cs`) - extracts taint gates (bounds checks, null checks, auth checks, permission checks, type checks) from binary call paths
- `CfgExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/CfgExtractor.cs`) - control flow graph extraction from disassembled binaries
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - generates call-sequence n-grams from lifted IR for call graph analysis
- `CallGraphMatcherAdapter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/Matchers/MatcherAdapters.cs`) - adapter for call graph matching in validation harness
- **Interfaces**: `ICallNgramGenerator`, `IBinaryFeatureExtractor`
## E2E Test Plan
- [ ] Submit an ELF binary and verify call-graph extraction produces a valid set of function nodes and edges
- [ ] Verify `TaintGateExtractor` classifies conditions correctly (bounds check, null check, auth check, permission check, type check)
- [ ] Verify `CfgExtractor` produces control flow graphs from disassembled functions
- [ ] Verify `CallNgramGenerator` generates n-grams (n=2,3,4) from lifted function IR and computes Jaccard similarity
- [ ] Verify `ReachGraphBinaryReachabilityService` integrates with the ReachGraph module for function-level exploitability assessment
- [ ] Verify call-graph-based reachability results feed into the ensemble decision engine
## Verification Outcome (run-001)
- Tier 0/1/2 artifacts: docs/qa/feature-checks/runs/binaryindex/binary-call-graph-extraction-and-reachability-analysis/run-001/
- Result: not implemented at claim parity.
- Missing behavior:
- TaintGateExtractor.ExtractAsync returns empty output and does not perform binary/disassembly path extraction.
- CallGraphMatcherAdapter is placeholder logic with fixed score and TODO comments.
- ReachGraphBinaryReachabilityService.FindPathsAsync currently constructs simplified placeholder paths.
- No focused behavioral tests prove call-graph matcher/reachability adapter semantics end-to-end.

View File

@@ -0,0 +1,37 @@
# Binary Identity Extraction (Build-ID Based)
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Binary identity extraction using Build-IDs and symbol observations for ELF binary identification, with ground-truth validation and SBOM stability verification.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`
- **Key Classes**:
- `BinaryIdentityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/BinaryIdentityService.cs`) - main service for extracting binary identity from ELF/PE/Mach-O binaries
- `ElfFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/ElfFeatureExtractor.cs`) - extracts Build-ID, symbol tables, and section info from ELF binaries
- `PeFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/PeFeatureExtractor.cs`) - extracts CodeView GUID from Windows PE binaries
- `MachoFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/MachoFeatureExtractor.cs`) - extracts LC_UUID from Mach-O binaries
- `StreamGuard` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/StreamGuard.cs`) - safe stream handling for non-seekable streams
- **Interfaces**: `IBinaryFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/IBinaryFeatureExtractor.cs`)
- **Models**: `BinaryIdentity` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Models/BinaryIdentity.cs`)
## E2E Test Plan
- [ ] Submit an ELF binary with a known Build-ID and verify the extracted identity matches
- [ ] Submit a Windows PE binary and verify CodeView GUID extraction via `PeFeatureExtractor`
- [ ] Submit a Mach-O binary and verify LC_UUID extraction via `MachoFeatureExtractor`
- [ ] Verify that non-seekable streams are handled correctly via `StreamGuard`
- [ ] Verify that binaries without Build-IDs fall back to symbol-based identification
- [ ] Verify extracted identities are persisted and queryable through `BinaryVulnerabilityService`
## Verification Outcome (run-001)
- Tier 0/1/2 artifacts: docs/qa/feature-checks/runs/binaryindex/binary-identity-extraction/run-001/
- Result: not implemented at claim parity.
- Missing behavior:
- Build-ID-missing fallback path uses file hash, not symbol-observation-based identity as claimed.
- Ground-truth validation and SBOM stability verification are not implemented in the documented extraction flow.
- Existing behavioral tests do not explicitly prove PE CodeView GUID / Mach-O LC_UUID extraction semantics.

View File

@@ -0,0 +1,38 @@
# Binary Intelligence Graph / Binary Identity Indexing
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Complete BinaryIndex module with binary identity indexing, ELF feature extraction, vulnerability fingerprint matching, and reachability status tracking. Advisory marked as SUPERSEDED by this implementation.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`
- **Key Classes**:
- `BinaryIdentityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/BinaryIdentityService.cs`) - binary identity management
- `ElfFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/ElfFeatureExtractor.cs`) - ELF feature extraction
- `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - vulnerability matching with Build-ID catalog lookups
- `SignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs`) - signature-based vulnerability fingerprint matching
- `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/ReachGraphBinaryReachabilityService.cs`) - reachability status tracking
- **Models**: `BinaryIdentity`, `FixModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Models/`)
- **Persistence**: `IBinaryVulnAssertionRepository`, `IBinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/`)
## E2E Test Plan
- [ ] Verify end-to-end flow: submit binary, extract identity, index in the graph, and query by Build-ID
- [ ] Verify vulnerability fingerprint matching via `SignatureMatcher` returns correct match scores
- [ ] Verify reachability status tracking integrates with ReachGraph
- [ ] Verify `BinaryVulnerabilityService` correctly maps match methods (buildid_catalog, delta_signature, etc.)
- [ ] Verify binary identity indexing supports multi-tenant contexts via `ITenantContext`
## Verification
- Run: `docs/qa/feature-checks/runs/binaryindex/binary-intelligence-graph-binary-identity-indexing/run-001/`
- Date (UTC): 2026-02-11
- Verdict: `not_implemented`
## Missing / Mismatched Behavior
- Default WebService runtime composition wires `IBinaryVulnerabilityService` to `InMemoryBinaryVulnerabilityService`, so live resolution API behavior does not exercise full persistence-backed vulnerability matching.
- Analysis service registration defaults to `NullBinaryReachabilityService` unless explicitly overridden, so ReachGraph-backed reachability tracking is not active by default.
- `BinaryVulnerabilityService` method mapping does not explicitly include `delta_signature` in `MapMethod`, which mismatches the documented match-method coverage claim.

View File

@@ -0,0 +1,36 @@
# Binary Proof Verification Pipeline
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Full binary proof verification with ground truth sources (buildinfo, debuginfod, reproducible builds), validation, and golden set testing.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/`
- **Key Classes**:
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - orchestrates reproducible-build-based validation runs
- `ValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/ValidationHarness.cs`) - main validation harness with matcher adapter factory integration
- `KpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/KpiRegressionService.cs`) - KPI regression detection across validation runs
- `GroundTruthProvenanceResolver` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Provenance/GroundTruthProvenanceResolver.cs`) - resolves symbol provenance from ground truth sources
- **Interfaces**: `IValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation.Abstractions/IValidationHarness.cs`), `IKpiRegressionService`, `ISymbolProvenanceResolver`
- **Registration**: `ServiceCollectionExtensions.AddCorpusBundleExport/Import` for bundle exchange
## E2E Test Plan
- [ ] Run a validation harness against a known binary pair and verify proof correctness
- [ ] Verify ground truth resolution from buildinfo sources produces correct provenance data
- [ ] Verify KPI regression service detects accuracy drops between validation runs
- [ ] Verify golden set validation produces deterministic, reproducible results
- [ ] Verify corpus bundle export/import round-trips correctly
- [ ] Verify validation run attestor generates valid attestation predicates with corpus snapshot IDs
## Verification Outcome (run-001)
- Tier 0/1/2 artifacts: docs/qa/feature-checks/runs/binaryindex/binary-proof-verification-pipeline/run-001/
- Result: not implemented at claim parity.
- Missing behavior:
- ValidationHarnessService still uses placeholder stubs for symbol recovery, IR lifting, fingerprint generation, function matching, and SBOM hash calculation.
- Validation matcher adapters (SemanticDiff, InstructionHash, CallGraph) are TODO-backed placeholders with synthetic scores instead of production matching logic.
- Current tests explicitly validate scaffold behavior (skeleton contract), so passing suites do not prove the full proof-verification contract described in this dossier.

View File

@@ -0,0 +1,34 @@
# Binary Reachability Analysis
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Binary-level reachability analysis integrating with the ReachGraph and taint gate extraction for function-level exploitability assessment.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`
- **Key Classes**:
- `ReachGraphBinaryReachabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/ReachGraphBinaryReachabilityService.cs`) - connects binary analysis to the ReachGraph module for function-level reachability
- `TaintGateExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/TaintGateExtractor.cs`) - identifies taint gate types (BoundsCheck, NullCheck, AuthCheck, PermissionCheck, TypeCheck) from condition strings
- `SignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs`) - matches vulnerability signatures at the binary level
- **Models**: `AnalysisResultModels`, `FingerprintModels`, `SignatureIndexModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/`)
- **Interfaces**: defined in `Interfaces.cs`, implementations in `Implementations.cs`
## E2E Test Plan
- [ ] Submit a binary with a known vulnerable function and verify reachability analysis identifies it as reachable from entry points
- [ ] Verify `TaintGateExtractor` correctly classifies all gate types (bounds, null, auth, permission, type checks)
- [ ] Verify that unreachable vulnerable functions reduce the exploitability score
- [ ] Verify integration between `ReachGraphBinaryReachabilityService` and the ReachGraph module
- [ ] Verify that taint gate presence between entry point and vulnerable function is reflected in the analysis result
## Verification Outcome (run-001)
- Tier 0/1/2 artifacts: docs/qa/feature-checks/runs/binaryindex/binary-reachability-analysis/run-001/
- Result: not implemented at claim parity.
- Missing behavior:
- Implementations.cs still contains NotImplementedException stubs for fingerprint extraction and related reachability pipeline contracts.
- Service registration defaults to stub/null analysis components (FingerprintExtractor, ReachabilityAnalyzer, NullBinaryReachabilityService) rather than full production reachability wiring.
- ReachGraphBinaryReachabilityService.FindPathsAsync uses simplified two-node path construction, not full graph-path tracing semantics claimed by the feature.

View File

@@ -0,0 +1,38 @@
# Binary Resolution API with Cache Layer
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
REST API endpoints (`POST /api/v1/resolve/vuln` and `/vuln/batch`) for querying whether a CVE is resolved through binary-level backport detection. Includes Valkey-backed response caching, rate limiting middleware, and telemetry instrumentation.
## Implementation Details
- **Modules**: `src/BinaryIndex/StellaOps.BinaryIndex.WebService/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/`
- **Key Classes**:
- `ResolutionController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/ResolutionController.cs`) - REST API controller with `POST /api/v1/resolve/vuln` and `/vuln/batch` endpoints
- `ResolutionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Resolution/ResolutionService.cs`) - core resolution logic
- `CachedResolutionService` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Services/CachedResolutionService.cs`) - decorator adding Valkey-backed caching around ResolutionService
- `ResolutionCacheService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/ResolutionCacheService.cs`) - Valkey cache operations for resolution results
- `RateLimitingMiddleware` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Middleware/RateLimitingMiddleware.cs`) - per-tenant rate limiting with X-RateLimit headers
- `ResolutionTelemetry` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Telemetry/ResolutionTelemetry.cs`) - OpenTelemetry metrics for resolution requests, cache hits, rate limits
- **Contracts**: `VulnResolutionRequest/Response`, `ResolutionMatchTypes` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Contracts/Resolution/VulnResolutionContracts.cs`)
- **Cache Options**: `BinaryCacheOptions`, `CacheOptionsValidation` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/`)
## E2E Test Plan
- [ ] Send `POST /api/v1/resolve/vuln` with a known CVE and package purl, verify resolution response contains match type (BuildId, DeltaSignature, etc.)
- [ ] Send batch request to `/api/v1/resolve/vuln/batch` with multiple packages and verify all are resolved
- [ ] Verify cache hit: send same request twice and confirm second response comes from cache (check telemetry counters)
- [ ] Verify rate limiting: exceed the configured request limit and confirm 429 response with X-RateLimit headers
- [ ] Verify telemetry: confirm resolution metrics are emitted (request count, cache hit ratio, latency histogram)
- [ ] Verify disabled rate limiting mode passes requests through without headers
## Verification Outcome
- Tier 0/1/2 artifacts: `docs/qa/feature-checks/runs/binaryindex/binary-resolution-api-with-cache-layer/run-002/`.
- Result: not implemented at claim parity.
- Missing behavior:
- Default runtime wiring uses `InMemoryBinaryVulnerabilityService`, so real BuildId/DeltaSignature vulnerability matching claims are not realized.
- Resolution telemetry counters are not invoked end-to-end from controller/service request flow.
- Tier 2 endpoint responses validate HTTP status behavior but do not establish production-grade CVE resolution semantics.

View File

@@ -0,0 +1,40 @@
# BinaryIndex User Configuration System
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Comprehensive user configuration for B2R2 lifter pooling, LowUIR enablement, Valkey function cache behavior, PostgreSQL persistence, with ops endpoints for health/bench/cache/config and redaction rules for operator visibility.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/`
- **Key Classes**:
- `BinaryIndexOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOptions.cs`) - top-level config with sections for B2R2Pool, SemanticLifting, cache, persistence
- `B2R2PoolOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/B2R2LifterPoolOptions.cs`) - MaxPoolSizePerIsa (1-64), EnableWarmPreload, AcquireTimeout, EnableMetrics, WarmPreloadIsas
- `SemanticLiftingOptions` - B2R2Version, Enabled flag, function limits
- `BinaryCacheOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/BinaryCacheOptions.cs`) - Valkey cache configuration
- `CacheOptionsValidation` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Cache/CacheOptionsValidation.cs`) - validates cache config at startup
- `FunctionIrCacheOptions` - function IR cache TTL and size limits
- `BinaryIndexOpsOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Configuration/BinaryIndexOpsModels.cs`) - redacted keys list for operator visibility, bench rate limits
- **Source**: SPRINT_20260112_007_BINIDX_binaryindex_user_config.md
## E2E Test Plan
- [ ] Configure B2R2 pool with custom MaxPoolSizePerIsa and verify pool initializes with correct size
- [ ] Configure SemanticLifting as disabled and verify LowUIR lifting is skipped
- [ ] Configure Valkey cache options and verify function IR cache respects TTL settings
- [ ] Verify configuration binding from `StellaOps:BinaryIndex:*` config sections
- [ ] Verify redacted keys do not appear in ops config endpoint responses
- [ ] Verify CacheOptionsValidation rejects invalid configuration at startup
## Verification
- Run: `docs/qa/feature-checks/runs/binaryindex/binaryindex-user-configuration-system/run-001/`
- Date (UTC): 2026-02-11
- Verdict: `not_implemented`
## Missing / Mismatched Behavior
- Live Tier 2 probe with overridden `StellaOps__BinaryIndex__*` values did not affect `/api/v1/ops/binaryindex/config` output (values remained defaults).
- Runtime WebService composition does not bind the full `BinaryIndexOptions` (`StellaOps:BinaryIndex:*`) contract into the active ops endpoint path.
- Ops config response surface is narrower than the documented comprehensive user-configuration model (notably persistence/operator-oriented sectioning and redaction-oriented view semantics).

View File

@@ -0,0 +1,35 @@
# Byte-Level Binary Diffing with Rolling Hash Windows
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Byte-level binary comparison using rolling hash windows that identifies exactly which byte ranges changed between binary versions. Produces binary proof snippets with section analysis and privacy controls to strip raw bytes. Supports stream and file-based comparison.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`
- **Key Classes**:
- `PatchDiffEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/PatchDiffEngine.cs`) - core diffing engine computing byte-level differences between binary versions using function fingerprints
- `FunctionDiffer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionDiffer.cs`) - function-level comparison with semantic analysis option and call-graph edge diffing
- `FunctionRenameDetector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/FunctionRenameDetector.cs`) - detects function renames between versions using fingerprint similarity
- `VerdictCalculator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/VerdictCalculator.cs`) - computes patch verification verdicts from diff results
- `InMemoryDiffResultStore` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Storage/InMemoryDiffResultStore.cs`) - stores diff results with content-addressed IDs
- **Models**: `PatchDiffModels`, `DiffEvidenceModels`, `BinaryReference` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/Models/`)
- **Interfaces**: `IPatchDiffEngine`, `IDiffResultStore` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/`)
- **Source**: SPRINT_20260112_200_004_CHGTRC_byte_diffing.md
## E2E Test Plan
- [ ] Submit two binary versions and verify byte-range differences are identified with correct offsets
- [ ] Verify section analysis identifies which ELF sections changed (.text, .data, .rodata)
- [ ] Verify privacy controls strip raw bytes from proof snippets when configured
- [ ] Verify `FunctionRenameDetector` correctly identifies renamed functions between versions
- [ ] Verify `VerdictCalculator` produces correct patch verification verdict (patched vs unpatched)
- [ ] Verify diff results are stored with deterministic content-addressed IDs
## Implementation Gaps (QA 2026-02-11)
- Current diff engine is function/CFG-level (`PatchDiffEngine` + `FunctionDiffer`) and does not implement byte-range rolling-window outputs with exact offsets.
- Section-aware diff outputs (`.text/.data/.rodata`) and privacy controls to strip raw proof bytes are not present in exposed models/engine behavior.
- `InMemoryDiffResultStore` stores results using `Guid.NewGuid()` rather than deterministic content-addressed IDs.

View File

@@ -0,0 +1,41 @@
# Call-Ngram Fingerprinting for Binary Similarity Analysis
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Call-sequence n-gram extraction from lifted IR for improved cross-compiler binary similarity matching. Generates n-grams (n=2,3,4) from function call sequences and integrates into the semantic fingerprint pipeline with configurable dimension weights (instruction 0.4, CFG 0.3, call-ngram 0.2, semantic 0.1).
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
- **Key Classes**:
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - generates `CallNgramFingerprint` from `LiftedFunction` call sequences; computes Jaccard similarity between fingerprints
- `CallNgramFingerprint` (record in same file) - contains n-gram hash sets and metadata; has `Empty` sentinel for functions without calls
- **Interfaces**: `ICallNgramGenerator` (defined in `CallNgramGenerator.cs`) - `Generate(LiftedFunction)` and `ComputeSimilarity(CallNgramFingerprint, CallNgramFingerprint)`
- **Integration**: Used by `EnsembleDecisionEngine` and `FunctionAnalysisBuilder` as one of the matching dimensions with 0.2 default weight
- **Source**: SPRINT_20260118_026_BinaryIndex_deltasig_enhancements.md
## E2E Test Plan
- [ ] Generate call-ngram fingerprint from a function with known call sequences and verify correct n-gram extraction (n=2,3,4)
- [ ] Compute similarity between identical call sequences and verify similarity = 1.0
- [ ] Compute similarity between disjoint call sequences and verify similarity = 0.0
- [ ] Verify `CallNgramFingerprint.Empty` is returned for functions without call instructions
- [ ] Verify call-ngram dimension integrates into ensemble scoring with configurable weight (default 0.2)
- [ ] Verify cross-compiler similarity: same source compiled with GCC vs Clang should produce similar call n-grams
## Implementation Gaps (QA 2026-02-11)
- `CallNgramGenerator` is present, but no dedicated call-ngram behavioral tests exist in `src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Semantic.Tests/`.
- `EnsembleDecisionEngine` currently combines syntactic/semantic/embedding signals only and does not expose a call-ngram signal or the claimed default 0.2 call-ngram weight path.
- `FunctionAnalysisBuilder` does not compute or attach call-ngram fingerprints into the ensemble analysis pipeline.
- `CallNgramOptions.MinCallCount` is not enforced in generator output flow.
## Verification Outcome
- Tier 0/1/2 artifacts: `docs/qa/feature-checks/runs/binaryindex/call-ngram-fingerprinting-for-binary-similarity-analysis/run-001/`.
- Result: not implemented at claim parity.
- Missing behavior:
- `CallNgramGenerator` exists, but no first-class integration path wires call-ngram fingerprints into `FunctionAnalysisBuilder` outputs.
- `EnsembleDecisionEngine` and `EnsembleOptions` expose only syntactic/semantic/embedding dimensions; no call-ngram dimension with claimed default weight.
- No dedicated call-ngram generator behavioral tests verify n=2/3/4 extraction and similarity semantics as described by the dossier.

View File

@@ -0,0 +1,36 @@
# Delta signature matching and patch coverage analysis
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Delta signature matching traces symbol-level changes between vulnerable and fixed builds. PatchCoverageController exposes an API for patch coverage assessment.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`, `src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/`
- **Key Classes**:
- `DeltaSignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureMatcher.cs`) - matches delta signatures against target binaries
- `DeltaSignatureGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureGenerator.cs`) - generates delta signatures from binary pairs
- `DeltaSigService` / `DeltaSigServiceV2` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`) - service layer for delta signature operations (V2 adds IR diffs)
- `PatchCoverageController` (`src/BinaryIndex/StellaOps.BinaryIndex.WebService/Controllers/PatchCoverageController.cs`) - REST API for patch coverage queries using `IDeltaSignatureRepository`
- `SymbolChangeTracer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/SymbolChangeTracer.cs`) - traces symbol-level changes between builds
- `DeltaScopePolicyGate` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Policy/DeltaScopePolicyGate.cs`) - policy gate for delta scope enforcement
- **Interfaces**: `IDeltaSigService`, `IDeltaSignatureGenerator`, `IDeltaSignatureMatcher`, `ISymbolChangeTracer`
- **IR Diff**: `IrDiffGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/IrDiff/`) - generates IR-level diffs between function versions
## E2E Test Plan
- [ ] Generate a delta signature from known vulnerable/fixed binary pair and verify signature captures changed functions
- [ ] Match the generated delta signature against a target binary and verify correct patch status detection
- [ ] Query `PatchCoverageController` API for patch coverage and verify coverage percentage
- [ ] Verify `SymbolChangeTracer` identifies added, removed, and modified symbols
- [ ] Verify `DeltaScopePolicyGate` enforces delta scope policies
- [ ] Verify IR-level diff generation captures semantic function changes beyond byte-level diffs
## Verification
- Run: `run-002` (2026-02-11 UTC).
- Tier 1 build/test projects passed after remediation, including new `PatchCoverageController` behavior tests and deterministic `IDeltaSignatureRepository` fallback wiring in WebService.
- Tier 2 API checks now pass for positive and negative flows on `/api/v1/stats/patch-coverage*` endpoints.
- Claim parity remains incomplete for this feature because `IrDiffGenerator` still uses placeholder diff payload generation (`GenerateSingleDiffAsync`) instead of real lifted-IR semantic diff extraction, so the full advertised IR-diff capability is not implemented.

View File

@@ -0,0 +1,50 @@
# ELF Normalization and Delta Hashing
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Low-entropy delta signatures over ELF segments with normalization (relocation zeroing, NOP canonicalization, jump table rewriting). Not yet implemented.
## What's Implemented
- **Delta Signature Infrastructure**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/` - function-level delta signatures with V1 and V2 predicates exist
- `DeltaSignatureGenerator` - generates delta signatures (function-level, not ELF-segment-level)
- `DeltaSignatureMatcher` - matches delta signatures
- `CfgExtractor` - extracts control flow graphs
- `IrDiffGenerator` - IR-level diff generation
- **Binary Diff Engine**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Diff/PatchDiffEngine.cs` - byte-level and function-level diffing
- **ELF Feature Extraction**: `ElfFeatureExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/`) - extracts Build-ID and section info from ELF binaries
- **Disassembly**: `B2R2DisassemblyPlugin`, `HybridDisassemblyService` - multi-backend disassembly infrastructure
## What's Missing
- ELF segment-level normalization (relocation zeroing to eliminate position-dependent bytes)
- NOP canonicalization (normalizing NOP sled variations across compilers)
- Jump table rewriting (normalizing indirect jump table entries)
- Low-entropy delta hashing over normalized ELF segments (currently delta-sig operates at function level, not segment level)
- Segment-aware normalization that handles .text, .rodata, .data sections separately
## Implementation Plan
- Add ELF segment normalization pass to `ElfFeatureExtractor` or new `ElfNormalizer` class
- Implement relocation zeroing: identify and zero-out position-dependent bytes (GOT/PLT entries, absolute addresses)
- Implement NOP canonicalization: normalize all NOP variants to canonical form
- Implement jump table rewriting: normalize indirect jump table entries
- Add segment-level delta hashing on normalized output
- Integrate with existing `DeltaSignatureGenerator` for hybrid function+segment signatures
- Add tests using known ELF binaries with position-dependent variations
## Related Documentation
- Current delta-sig: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- ELF extraction: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/ElfFeatureExtractor.cs`
- Disassembly: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Disassembly.B2R2/`
## Verification
- Tier 0/1/2 artifacts: `docs/qa/feature-checks/runs/binaryindex/elf-normalization-and-delta-hashing/run-001/`.
- Result: not implemented at claim parity.
- Confirmed missing behavior:
- `ElfNormalizer`/segment normalization pipeline is absent (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Core/Services/ElfNormalizer.cs` missing).
- No relocation-zeroing, NOP-canonicalization, or jump-table rewriting implementation was found in Core/DeltaSig/Diff libraries.
- Existing behavior remains function-level delta signatures plus ELF metadata extraction, not segment-level low-entropy delta hashing.

View File

@@ -0,0 +1,37 @@
# Ensemble decision engine for multi-tier matching
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Ensemble decision engine combines multiple matching tiers (range match, Build-ID, fingerprint) with configurable weight tuning for vulnerability classification.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/`
- **Key Classes**:
- `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs`) - combines multiple matching signals with configurable weights into a final vulnerability classification decision
- `FunctionAnalysisBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs`) - builds function analysis inputs including optional ML embeddings
- `WeightTuningService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/WeightTuningService.cs`) - tunes ensemble weights based on golden set validation results
- `EnsembleOptions` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/Models.cs`) - configurable weights and thresholds for matching tiers
- `MlEmbeddingMatcherAdapter` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/MlEmbeddingMatcherAdapter.cs`) - adapts ML function embeddings for ensemble use
- **Interfaces**: `IEnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/IEnsembleDecisionEngine.cs`)
- **Registration**: `EnsembleServiceCollectionExtensions.AddBinarySimilarityServices()` for full pipeline setup
- **Benchmarks**: `EnsembleAccuracyBenchmarks`, `EnsembleLatencyBenchmarks` (`src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Benchmarks/`)
## E2E Test Plan
- [ ] Submit a binary with known vulnerability and verify ensemble produces correct classification
- [ ] Verify weight tuning: adjust instruction weight to 0.6 and verify it changes classification outcomes
- [ ] Verify multi-tier integration: Build-ID match, fingerprint match, and ML embedding all contribute to score
- [ ] Verify `FunctionAnalysisBuilder` correctly assembles all matching dimensions
- [ ] Verify `WeightTuningService` optimizes weights based on golden set validation accuracy
- [ ] Run accuracy benchmark and verify F1 score meets minimum threshold
## Verification
- Run: `run-001` (2026-02-11 UTC).
- Tier 1/2 builds and tests passed (`37/37`), but parity review found contract mismatch and missing coverage for key claims.
- Ensemble signal model currently exposes syntactic, semantic, embedding, and exact-hash signals, but the feature contract claims range-match, Build-ID, and fingerprint tiers (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/Models.cs:232`).
- `FunctionAnalysisBuilder` explicitly retains a simplified semantic-graph path when binary data is unavailable (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs:87`).
- No direct tests were found for `FunctionAnalysisBuilder` or `MlEmbeddingMatcherAdapter` in `src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Ensemble.Tests`.

View File

@@ -0,0 +1,40 @@
# Symbol Change Tracking in Binary Diffs (SymbolChangeTracer)
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Extends BinaryIndex DeltaSignature module to track which specific symbols changed between binary versions (not just whether they match). Adds change metadata to SymbolMatchResult and provides detailed CFG hash and instruction hash comparison for symbol-level binary change forensics.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
- **Key Classes**:
- `SymbolChangeTracer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/SymbolChangeTracer.cs`) - traces symbol-level changes between binary versions with detailed CFG hash and instruction hash comparison
- `DeltaSignatureGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureGenerator.cs`) - generates delta signatures capturing symbol change metadata
- `DeltaSignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/DeltaSignatureMatcher.cs`) - matches signatures with change tracking awareness
- `CfgExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/Cfg/CfgExtractor.cs`) - extracts CFG for hash comparison
- `IrDiffGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/IrDiff/IrDiffGenerator.cs`) - generates IR-level diffs for detailed change analysis
- **Interfaces**: `ISymbolChangeTracer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/ISymbolChangeTracer.cs`)
- **Models**: `SymbolMatchResult` with change metadata in `Models.cs`
- **Source**: SPRINT_20260112_200_003_BINDEX_symbol_tracking.md
## E2E Test Plan
- [ ] Compare two binary versions with known symbol changes and verify `SymbolChangeTracer` identifies which symbols changed
- [ ] Verify CFG hash comparison detects control flow changes in modified functions
- [ ] Verify instruction hash comparison detects instruction-level changes
- [ ] Verify `SymbolMatchResult` includes change metadata (added, removed, modified symbols)
- [ ] Verify IR-level diff captures semantic changes beyond byte-level differences
- [ ] Verify unchanged symbols are correctly identified as stable between versions
## Verification Outcome (run-001)
- Tier 0/1/2 artifacts: `docs/qa/feature-checks/runs/binaryindex/symbol-change-tracking-in-binary-diffs/run-001/`
- Result: not implemented at full claim parity.
- Verified behavior:
- `SymbolChangeTracer` and related tests cover added/removed/modified/unchanged classification, patched heuristics, and chunk-index metadata.
- `SymbolMatchResult` metadata fields are populated and exercised by the DeltaSig test suite.
- Missing behavior:
- `IrDiffGenerator` remains placeholder-backed (`In a real implementation` path with zeroed diff summary and placeholder digest), so the dossier's semantic IR-diff-forensics claim is not fully implemented.
- No dedicated `IrDiffGenerator` behavioral tests were found in `StellaOps.BinaryIndex.DeltaSig.Tests`, so IR-level diff semantics are not verified by tests.

View File

@@ -0,0 +1,39 @@
# Symbol Source Connectors (Debuginfod, Buildinfo, Ddeb, SecDb)
## Module
BinaryIndex
## Status
PARTIALLY_IMPLEMENTED
## Description
Four symbol source connector implementations (Debuginfod, Debian Buildinfo, Ubuntu Ddeb, Alpine SecDb), each with plugin registration and configuration support.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/`
- **Key Classes**:
- **Alpine SecDb**: `AlpineCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Alpine/AlpineCorpusConnector.cs`) - connects to Alpine security database; `ApkBuildSecfixesExtractor` - extracts secfixes from APK build files
- **Debian Buildinfo**: `DebianCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Debian/DebianCorpusConnector.cs`) - connects to Debian buildinfo sources; `DebianMirrorPackageSource` - mirrors Debian repositories
- **RPM**: `RpmCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus.Rpm/RpmCorpusConnector.cs`) - connects to RPM repositories; `SrpmChangelogExtractor` - extracts changelogs from source RPMs
- **Library-specific**: `CurlCorpusConnector`, `GlibcCorpusConnector`, `OpenSslCorpusConnector`, `ZlibCorpusConnector` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Corpus/Connectors/`)
- **Interfaces**: `IBinaryCorpusConnector`, `ILibraryCorpusConnector`, `IAlpinePackageSource`, `IDebianPackageSource`, `IRpmPackageSource`
- **Package Extractors**: `AlpinePackageExtractor`, `DebianPackageExtractor`, `RpmPackageExtractor` - extract binaries from packages using `IBinaryFeatureExtractor`
## E2E Test Plan
- [ ] Connect via `AlpineCorpusConnector` and verify secfixes data is extracted from APK builds
- [ ] Connect via `DebianCorpusConnector` and verify buildinfo data is retrieved from Debian mirrors
- [ ] Connect via `RpmCorpusConnector` and verify RPM changelog extraction works
- [ ] Verify library-specific connectors (OpenSSL, glibc, curl, zlib) retrieve correct binary versions
- [ ] Verify all connectors produce `CorpusSnapshot` with consistent snapshot IDs
- [ ] Verify package extractors use `IBinaryFeatureExtractor` to extract identity features from packages
## Verification Outcome (run-001)
- Tier 0/1/2 artifacts: `docs/qa/feature-checks/runs/binaryindex/symbol-source-connectors/run-001/`
- Result: not implemented at full claim parity.
- Verified behavior:
- Alpine, Debian, and RPM mirror package source/parsing and package extractor suites execute successfully with deterministic cache-fallback behavior.
- Corpus contracts and package extractors pass local behavioral tests.
- Missing behavior:
- No Debuginfod/Buildinfo/Ddeb connector implementation classes were found in the declared module paths, despite the feature claim naming those connector families.
- Library-specific connector extraction flows remain placeholder-backed (`CurlCorpusConnector`, `GlibcCorpusConnector`, `OpenSslCorpusConnector`, `ZlibCorpusConnector`), with deb/apk extraction paths returning `null`.
- No dedicated tests were found for these library-specific connector classes, so end-to-end retrieval behavior for those connectors is not verified.

View File

@@ -0,0 +1,29 @@
# Validation Harness and Reproducibility Verification
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Validation harness with determinism validation, SBOM stability checking, and reproducible build verification. Includes local rebuild backend and bundle export/import.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Builders/`
- **Key Classes**:
- `ValidationHarness` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Validation/ValidationHarness.cs`) - main validation harness with `IMatcherAdapterFactory` for pluggable matching
- `ValidationHarnessService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ValidationHarnessService.cs`) - reproducible-build validation with `ValidationRunContext`
- `ReproducibleBuildJob` (`src/BinaryIndex/StellaOps.BinaryIndex.Worker/Jobs/ReproducibleBuildJob.cs`) - local rebuild backend
- `KpiRegressionService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/Services/KpiRegressionService.cs`) - SBOM stability and KPI regression tracking
- **Bundle Export/Import**: `ServiceCollectionExtensions.AddCorpusBundleExport/Import` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.GroundTruth.Reproducible/ServiceCollectionExtensions.cs`)
- **Interfaces**: `IValidationHarness`, `IKpiRegressionService`, `IReproducibleBuildJob`
- **Registration**: `ValidationServiceCollectionExtensions.AddValidationHarness()`
## E2E Test Plan
- [ ] Run validation harness and verify deterministic results for identical inputs
- [ ] Verify SBOM stability checking detects unstable hash generation
- [ ] Verify reproducible build verification: rebuild from source and compare against original binary
- [ ] Verify bundle export produces a self-contained archive importable on air-gapped systems
- [ ] Verify bundle import restores corpus data and enables offline validation
- [ ] Verify KPI regression tracking across multiple validation harness runs