new advisories work and features gaps work

This commit is contained in:
master
2026-01-14 18:39:19 +02:00
parent 95d5898650
commit 15aeac8e8b
148 changed files with 16731 additions and 554 deletions

View File

@@ -409,6 +409,143 @@ public SemanticFingerprint? SemanticFingerprint { get; init; }
| False positive rate | <10% | <5% |
| P95 fingerprint latency | <100ms | <50ms |
##### 2.2.5.7 B2R2 LowUIR Adapter
The B2R2LowUirLiftingService implements `IIrLiftingService` using B2R2's native lifting capabilities. This provides cross-platform IR representation for semantic analysis.
**Key Components:**
```csharp
public sealed class B2R2LowUirLiftingService : IIrLiftingService
{
// Lifts to B2R2 LowUIR and maps to Stella IR model
public Task<LiftedFunction> LiftToIrAsync(
IReadOnlyList<DisassembledInstruction> instructions,
string functionName,
LiftOptions? options = null,
CancellationToken ct = default);
}
```
**Supported ISAs:**
- Intel (x86-32, x86-64)
- ARM (ARMv7, ARMv8/ARM64)
- MIPS (32/64)
- RISC-V (64)
- PowerPC, SPARC, SH4, AVR, EVM
**IR Statement Mapping:**
| B2R2 LowUIR | Stella IR Kind |
|-------------|----------------|
| Put | IrStatementKind.Store |
| Store | IrStatementKind.Store |
| Get | IrStatementKind.Load |
| Load | IrStatementKind.Load |
| BinOp | IrStatementKind.BinaryOp |
| UnOp | IrStatementKind.UnaryOp |
| Jmp | IrStatementKind.Jump |
| CJmp | IrStatementKind.ConditionalJump |
| InterJmp | IrStatementKind.IndirectJump |
| Call | IrStatementKind.Call |
| SideEffect | IrStatementKind.SideEffect |
**Determinism Guarantees:**
- Statements ordered by block address (ascending)
- Blocks sorted by entry address (ascending)
- Consistent IR IDs across identical inputs
- InvariantCulture used for all string formatting
##### 2.2.5.8 B2R2 Lifter Pool
The `B2R2LifterPool` provides bounded pooling and warm preload for B2R2 lifting units to reduce per-call allocation overhead.
**Configuration (`B2R2LifterPoolOptions`):**
| Option | Default | Description |
|--------|---------|-------------|
| `MaxPoolSizePerIsa` | 4 | Maximum pooled lifters per ISA |
| `EnableWarmPreload` | true | Preload lifters at startup |
| `WarmPreloadIsas` | ["intel-64", "intel-32", "armv8-64", "armv7-32"] | ISAs to warm |
| `AcquireTimeout` | 5s | Timeout for acquiring a lifter |
**Pool Statistics:**
- `TotalPooledLifters`: Lifters currently in pool
- `TotalActiveLifters`: Lifters currently in use
- `IsWarm`: Whether pool has been warmed
- `IsaStats`: Per-ISA pool and active counts
**Usage:**
```csharp
using var lifter = _lifterPool.Acquire(isa);
var stmts = lifter.LiftingUnit.LiftInstruction(address);
// Lifter automatically returned to pool on dispose
```
##### 2.2.5.9 Function IR Cache
The `FunctionIrCacheService` provides Valkey-backed caching for computed semantic fingerprints to avoid redundant IR lifting and graph hashing.
**Cache Key Structure:**
```
(isa, b2r2_version, normalization_recipe, canonical_ir_hash)
```
**Configuration (`FunctionIrCacheOptions`):**
| Option | Default | Description |
|--------|---------|-------------|
| `KeyPrefix` | "stellaops:binidx:funccache:" | Valkey key prefix |
| `CacheTtl` | 4h | TTL for cached entries |
| `MaxTtl` | 24h | Maximum TTL |
| `Enabled` | true | Whether caching is enabled |
| `B2R2Version` | "0.9.1" | B2R2 version for cache key |
| `NormalizationRecipeVersion` | "v1" | Recipe version for cache key |
**Cache Entry (`CachedFunctionFingerprint`):**
- `FunctionAddress`, `FunctionName`
- `SemanticFingerprint`: The computed fingerprint
- `IrStatementCount`, `BasicBlockCount`
- `ComputedAtUtc`: ISO-8601 timestamp
- `B2R2Version`, `NormalizationRecipe`
**Invalidation Rules:**
- Cache entries expire after `CacheTtl` (default 4h)
- Changing B2R2 version or normalization recipe results in cache misses
- Manual invalidation via `RemoveAsync()`
**Statistics:**
- Hits, Misses, Evictions
- Hit Rate
- Enabled status
##### 2.2.5.10 Ops Endpoints
BinaryIndex exposes operational endpoints for health, benchmarking, cache monitoring, and configuration visibility.
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/ops/binaryindex/health` | GET | Health status with lifter warmness, cache availability |
| `/api/v1/ops/binaryindex/bench/run` | POST | Run benchmark, return latency stats |
| `/api/v1/ops/binaryindex/cache` | GET | Function IR cache hit/miss statistics |
| `/api/v1/ops/binaryindex/config` | GET | Effective configuration (secrets redacted) |
**Health Response:**
```json
{
"status": "healthy",
"timestamp": "2026-01-14T12:00:00Z",
"lifterStatus": "warm",
"lifterWarm": true,
"lifterPoolStats": { "intel-64": 4, "armv8-64": 2 },
"cacheStatus": "enabled",
"cacheEnabled": true
}
```
**Determinism Constraints:**
- All timestamps in ISO-8601 UTC format
- ASCII-only output
- Deterministic JSON key ordering
- Secrets/credentials redacted from config endpoint
#### 2.2.6 Binary Vulnerability Service
Main query interface for consumers.

View File

@@ -113,19 +113,51 @@ Semantic diffing is an advanced binary analysis capability that detects function
### Phase 1: IR-Level Semantic Analysis (Foundation)
**Sprint:** `SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
**Sprints:**
- `SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
- `SPRINT_20260112_004_BINIDX_b2r2_lowuir_perf_cache.md` (Performance & Ops)
Leverage B2R2's Intermediate Representation (IR) for semantic-level function comparison.
**Key Components:**
- `IrLiftingService` - Lift instructions to LowUIR
- `B2R2LowUirLiftingService` - Lifts instructions to B2R2 LowUIR, maps to Stella IR model
- `B2R2LifterPool` - Bounded pool with warm preload for lifter reuse
- `FunctionIrCacheService` - Valkey-backed cache for semantic fingerprints
- `SemanticGraphExtractor` - Build Key-Semantics Graph (KSG)
- `WeisfeilerLehmanHasher` - Graph fingerprinting
- `SemanticMatcher` - Semantic similarity scoring
**B2R2LowUirLiftingService Implementation:**
- Supports Intel, ARM, MIPS, RISC-V, PowerPC, SPARC, SH4, AVR, EVM
- Maps B2R2 LowUIR statements to `IrStatement` model
- Applies SSA numbering to temporary registers
- Deterministic block ordering (by entry address)
- InvariantCulture formatting throughout
**B2R2LifterPool Implementation:**
- Bounded per-ISA pooling (default 4 lifters/ISA)
- Warm preload at startup for common ISAs
- Per-ISA stats (pooled, active, max)
- Automatic return on dispose
**FunctionIrCacheService Implementation:**
- Cache key: `(isa, b2r2_version, normalization_recipe, canonical_ir_hash)`
- Valkey as hot cache (default 4h TTL)
- PostgreSQL persistence for fingerprint records
- Hit/miss/eviction statistics
**Ops Endpoints:**
- `GET /api/v1/ops/binaryindex/health` - Lifter warmness, cache status
- `POST /api/v1/ops/binaryindex/bench/run` - Benchmark latency
- `GET /api/v1/ops/binaryindex/cache` - Cache statistics
- `GET /api/v1/ops/binaryindex/config` - Effective configuration
**Deliverables:**
- `StellaOps.BinaryIndex.Semantic` library
- 20 tasks, ~3 weeks
- `StellaOps.BinaryIndex.Disassembly.B2R2` (LowUIR adapter, lifter pool)
- `StellaOps.BinaryIndex.Cache` (function IR cache)
- BinaryIndexOpsController
- 20+ tasks, ~3 weeks
### Phase 2: Function Behavior Corpus (Scale)