new advisories work and features gaps work
This commit is contained in:
@@ -409,6 +409,143 @@ public SemanticFingerprint? SemanticFingerprint { get; init; }
|
||||
| False positive rate | <10% | <5% |
|
||||
| P95 fingerprint latency | <100ms | <50ms |
|
||||
|
||||
##### 2.2.5.7 B2R2 LowUIR Adapter
|
||||
|
||||
The B2R2LowUirLiftingService implements `IIrLiftingService` using B2R2's native lifting capabilities. This provides cross-platform IR representation for semantic analysis.
|
||||
|
||||
**Key Components:**
|
||||
|
||||
```csharp
|
||||
public sealed class B2R2LowUirLiftingService : IIrLiftingService
|
||||
{
|
||||
// Lifts to B2R2 LowUIR and maps to Stella IR model
|
||||
public Task<LiftedFunction> LiftToIrAsync(
|
||||
IReadOnlyList<DisassembledInstruction> instructions,
|
||||
string functionName,
|
||||
LiftOptions? options = null,
|
||||
CancellationToken ct = default);
|
||||
}
|
||||
```
|
||||
|
||||
**Supported ISAs:**
|
||||
- Intel (x86-32, x86-64)
|
||||
- ARM (ARMv7, ARMv8/ARM64)
|
||||
- MIPS (32/64)
|
||||
- RISC-V (64)
|
||||
- PowerPC, SPARC, SH4, AVR, EVM
|
||||
|
||||
**IR Statement Mapping:**
|
||||
| B2R2 LowUIR | Stella IR Kind |
|
||||
|-------------|----------------|
|
||||
| Put | IrStatementKind.Store |
|
||||
| Store | IrStatementKind.Store |
|
||||
| Get | IrStatementKind.Load |
|
||||
| Load | IrStatementKind.Load |
|
||||
| BinOp | IrStatementKind.BinaryOp |
|
||||
| UnOp | IrStatementKind.UnaryOp |
|
||||
| Jmp | IrStatementKind.Jump |
|
||||
| CJmp | IrStatementKind.ConditionalJump |
|
||||
| InterJmp | IrStatementKind.IndirectJump |
|
||||
| Call | IrStatementKind.Call |
|
||||
| SideEffect | IrStatementKind.SideEffect |
|
||||
|
||||
**Determinism Guarantees:**
|
||||
- Statements ordered by block address (ascending)
|
||||
- Blocks sorted by entry address (ascending)
|
||||
- Consistent IR IDs across identical inputs
|
||||
- InvariantCulture used for all string formatting
|
||||
|
||||
##### 2.2.5.8 B2R2 Lifter Pool
|
||||
|
||||
The `B2R2LifterPool` provides bounded pooling and warm preload for B2R2 lifting units to reduce per-call allocation overhead.
|
||||
|
||||
**Configuration (`B2R2LifterPoolOptions`):**
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `MaxPoolSizePerIsa` | 4 | Maximum pooled lifters per ISA |
|
||||
| `EnableWarmPreload` | true | Preload lifters at startup |
|
||||
| `WarmPreloadIsas` | ["intel-64", "intel-32", "armv8-64", "armv7-32"] | ISAs to warm |
|
||||
| `AcquireTimeout` | 5s | Timeout for acquiring a lifter |
|
||||
|
||||
**Pool Statistics:**
|
||||
- `TotalPooledLifters`: Lifters currently in pool
|
||||
- `TotalActiveLifters`: Lifters currently in use
|
||||
- `IsWarm`: Whether pool has been warmed
|
||||
- `IsaStats`: Per-ISA pool and active counts
|
||||
|
||||
**Usage:**
|
||||
```csharp
|
||||
using var lifter = _lifterPool.Acquire(isa);
|
||||
var stmts = lifter.LiftingUnit.LiftInstruction(address);
|
||||
// Lifter automatically returned to pool on dispose
|
||||
```
|
||||
|
||||
##### 2.2.5.9 Function IR Cache
|
||||
|
||||
The `FunctionIrCacheService` provides Valkey-backed caching for computed semantic fingerprints to avoid redundant IR lifting and graph hashing.
|
||||
|
||||
**Cache Key Structure:**
|
||||
```
|
||||
(isa, b2r2_version, normalization_recipe, canonical_ir_hash)
|
||||
```
|
||||
|
||||
**Configuration (`FunctionIrCacheOptions`):**
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `KeyPrefix` | "stellaops:binidx:funccache:" | Valkey key prefix |
|
||||
| `CacheTtl` | 4h | TTL for cached entries |
|
||||
| `MaxTtl` | 24h | Maximum TTL |
|
||||
| `Enabled` | true | Whether caching is enabled |
|
||||
| `B2R2Version` | "0.9.1" | B2R2 version for cache key |
|
||||
| `NormalizationRecipeVersion` | "v1" | Recipe version for cache key |
|
||||
|
||||
**Cache Entry (`CachedFunctionFingerprint`):**
|
||||
- `FunctionAddress`, `FunctionName`
|
||||
- `SemanticFingerprint`: The computed fingerprint
|
||||
- `IrStatementCount`, `BasicBlockCount`
|
||||
- `ComputedAtUtc`: ISO-8601 timestamp
|
||||
- `B2R2Version`, `NormalizationRecipe`
|
||||
|
||||
**Invalidation Rules:**
|
||||
- Cache entries expire after `CacheTtl` (default 4h)
|
||||
- Changing B2R2 version or normalization recipe results in cache misses
|
||||
- Manual invalidation via `RemoveAsync()`
|
||||
|
||||
**Statistics:**
|
||||
- Hits, Misses, Evictions
|
||||
- Hit Rate
|
||||
- Enabled status
|
||||
|
||||
##### 2.2.5.10 Ops Endpoints
|
||||
|
||||
BinaryIndex exposes operational endpoints for health, benchmarking, cache monitoring, and configuration visibility.
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/api/v1/ops/binaryindex/health` | GET | Health status with lifter warmness, cache availability |
|
||||
| `/api/v1/ops/binaryindex/bench/run` | POST | Run benchmark, return latency stats |
|
||||
| `/api/v1/ops/binaryindex/cache` | GET | Function IR cache hit/miss statistics |
|
||||
| `/api/v1/ops/binaryindex/config` | GET | Effective configuration (secrets redacted) |
|
||||
|
||||
**Health Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"timestamp": "2026-01-14T12:00:00Z",
|
||||
"lifterStatus": "warm",
|
||||
"lifterWarm": true,
|
||||
"lifterPoolStats": { "intel-64": 4, "armv8-64": 2 },
|
||||
"cacheStatus": "enabled",
|
||||
"cacheEnabled": true
|
||||
}
|
||||
```
|
||||
|
||||
**Determinism Constraints:**
|
||||
- All timestamps in ISO-8601 UTC format
|
||||
- ASCII-only output
|
||||
- Deterministic JSON key ordering
|
||||
- Secrets/credentials redacted from config endpoint
|
||||
|
||||
#### 2.2.6 Binary Vulnerability Service
|
||||
|
||||
Main query interface for consumers.
|
||||
|
||||
@@ -113,19 +113,51 @@ Semantic diffing is an advanced binary analysis capability that detects function
|
||||
|
||||
### Phase 1: IR-Level Semantic Analysis (Foundation)
|
||||
|
||||
**Sprint:** `SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
|
||||
**Sprints:**
|
||||
- `SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
|
||||
- `SPRINT_20260112_004_BINIDX_b2r2_lowuir_perf_cache.md` (Performance & Ops)
|
||||
|
||||
Leverage B2R2's Intermediate Representation (IR) for semantic-level function comparison.
|
||||
|
||||
**Key Components:**
|
||||
- `IrLiftingService` - Lift instructions to LowUIR
|
||||
- `B2R2LowUirLiftingService` - Lifts instructions to B2R2 LowUIR, maps to Stella IR model
|
||||
- `B2R2LifterPool` - Bounded pool with warm preload for lifter reuse
|
||||
- `FunctionIrCacheService` - Valkey-backed cache for semantic fingerprints
|
||||
- `SemanticGraphExtractor` - Build Key-Semantics Graph (KSG)
|
||||
- `WeisfeilerLehmanHasher` - Graph fingerprinting
|
||||
- `SemanticMatcher` - Semantic similarity scoring
|
||||
|
||||
**B2R2LowUirLiftingService Implementation:**
|
||||
- Supports Intel, ARM, MIPS, RISC-V, PowerPC, SPARC, SH4, AVR, EVM
|
||||
- Maps B2R2 LowUIR statements to `IrStatement` model
|
||||
- Applies SSA numbering to temporary registers
|
||||
- Deterministic block ordering (by entry address)
|
||||
- InvariantCulture formatting throughout
|
||||
|
||||
**B2R2LifterPool Implementation:**
|
||||
- Bounded per-ISA pooling (default 4 lifters/ISA)
|
||||
- Warm preload at startup for common ISAs
|
||||
- Per-ISA stats (pooled, active, max)
|
||||
- Automatic return on dispose
|
||||
|
||||
**FunctionIrCacheService Implementation:**
|
||||
- Cache key: `(isa, b2r2_version, normalization_recipe, canonical_ir_hash)`
|
||||
- Valkey as hot cache (default 4h TTL)
|
||||
- PostgreSQL persistence for fingerprint records
|
||||
- Hit/miss/eviction statistics
|
||||
|
||||
**Ops Endpoints:**
|
||||
- `GET /api/v1/ops/binaryindex/health` - Lifter warmness, cache status
|
||||
- `POST /api/v1/ops/binaryindex/bench/run` - Benchmark latency
|
||||
- `GET /api/v1/ops/binaryindex/cache` - Cache statistics
|
||||
- `GET /api/v1/ops/binaryindex/config` - Effective configuration
|
||||
|
||||
**Deliverables:**
|
||||
- `StellaOps.BinaryIndex.Semantic` library
|
||||
- 20 tasks, ~3 weeks
|
||||
- `StellaOps.BinaryIndex.Disassembly.B2R2` (LowUIR adapter, lifter pool)
|
||||
- `StellaOps.BinaryIndex.Cache` (function IR cache)
|
||||
- BinaryIndexOpsController
|
||||
- 20+ tasks, ~3 weeks
|
||||
|
||||
### Phase 2: Function Behavior Corpus (Scale)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user