# Valkey Advisory Cache Per SPRINT_8200_0013_0001. ## Overview The Valkey Advisory Cache provides sub-20ms read latency for canonical advisory lookups by caching advisory data and maintaining fast-path indexes. The cache integrates with the Concelier CanonicalAdvisoryService as a read-through cache with automatic population and invalidation. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Advisory Cache Flow │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────┐ miss ┌───────────┐ fetch ┌───────────────┐ │ │ │ Client │ ─────────► │ Valkey │ ──────────► │ PostgreSQL │ │ │ │ Request │ │ Cache │ │ Canonical │ │ │ └───────────┘ └───────────┘ │ Store │ │ │ ▲ │ └───────────────┘ │ │ │ │ hit │ │ │ │ ▼ │ │ │ └──────────────── (< 20ms) ◄────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## Configuration Configure in `concelier.yaml`: ```yaml ConcelierCache: # Valkey connection settings ConnectionString: "localhost:6379,abortConnect=false,connectTimeout=5000" Database: 0 InstanceName: "concelier:" # TTL settings by interest score tier TtlPolicy: HighInterestTtlHours: 24 # Interest score >= 0.7 MediumInterestTtlHours: 4 # Interest score 0.3 - 0.7 LowInterestTtlHours: 1 # Interest score < 0.3 DefaultTtlHours: 2 # Index settings HotSetMaxSize: 10000 # Max entries in rank:hot EnablePurlIndex: true EnableCveIndex: true # Connection pool settings PoolSize: 20 ConnectRetryCount: 3 ReconnectRetryDelayMs: 1000 # Fallback behavior when Valkey unavailable FallbackToPostgres: true HealthCheckIntervalSeconds: 30 ``` ## Key Schema ### Advisory Entry **Key:** `advisory:{merge_hash}` **Value:** JSON-serialized `CanonicalAdvisory` **TTL:** Based on interest score tier ```json { "id": "uuid", "cve": "CVE-2024-1234", "affects_key": "pkg:npm/express@4.0.0", "merge_hash": "sha256:a1b2c3...", "severity": "high", "interest_score": 0.85, "title": "...", "updated_at": "2025-01-15T10:30:00Z" } ``` ### Hot Set (Sorted Set) **Key:** `rank:hot` **Score:** Interest score (0.0 - 1.0) **Member:** merge_hash Stores top advisories by interest score for quick access. ### PURL Index (Set) **Key:** `by:purl:{normalized_purl}` **Members:** Set of merge_hash values Maps package URLs to affected advisories. Example: `by:purl:pkg:npm/express@4.0.0` → `{sha256:a1b2c3..., sha256:d4e5f6...}` ### CVE Index (Set) **Key:** `by:cve:{cve_id}` **Members:** Set of merge_hash values Maps CVE IDs to canonical advisories. Example: `by:cve:cve-2024-1234` → `{sha256:a1b2c3...}` ## Operations ### Get Advisory ```csharp // Service interface public interface IAdvisoryCacheService { Task GetAsync(string mergeHash, CancellationToken ct = default); Task SetAsync(CanonicalAdvisory advisory, CancellationToken ct = default); Task InvalidateAsync(string mergeHash, CancellationToken ct = default); Task> GetByPurlAsync(string purl, CancellationToken ct = default); Task> GetHotAsync(int count = 100, CancellationToken ct = default); Task IndexPurlAsync(string purl, string mergeHash, CancellationToken ct = default); Task IndexCveAsync(string cve, string mergeHash, CancellationToken ct = default); } ``` ### Read-Through Cache ``` 1. GetAsync(mergeHash) called 2. Check Valkey: GET advisory:{mergeHash} └─ Hit: deserialize and return └─ Miss: fetch from PostgreSQL, cache result, return ``` ### Cache Population Advisories are cached when: - First read (read-through) - Ingested from source connectors - Imported from federation bundles - Updated by merge operations ### Cache Invalidation Invalidation occurs when: - Advisory is updated (re-merge with new data) - Advisory is withdrawn - Manual cache flush requested ```csharp // Invalidate single advisory await cacheService.InvalidateAsync(mergeHash, ct); // Invalidate multiple (e.g., after bulk import) await cacheService.InvalidateManyAsync(mergeHashes, ct); ``` ## TTL Policy Interest score determines TTL tier: | Interest Score | TTL | Rationale | |----------------|-----|-----------| | >= 0.7 (High) | 24 hours | Hot advisories: likely to be queried frequently | | 0.3 - 0.7 (Medium) | 4 hours | Moderate interest: balance between freshness and cache hits | | < 0.3 (Low) | 1 hour | Low interest: evict quickly to save memory | TTL is set when advisory is cached: ```csharp var ttl = ttlPolicy.GetTtl(advisory.InterestScore); await cache.SetAsync(key, advisory, ttl, ct); ``` ## Monitoring ### Metrics | Metric | Type | Description | |--------|------|-------------| | `concelier_cache_hits_total` | Counter | Total cache hits | | `concelier_cache_misses_total` | Counter | Total cache misses | | `concelier_cache_hit_rate` | Gauge | Hit rate (hits / total) | | `concelier_cache_latency_ms` | Histogram | Cache operation latency | | `concelier_cache_size_bytes` | Gauge | Estimated cache memory usage | | `concelier_cache_hot_set_size` | Gauge | Entries in rank:hot | ### OpenTelemetry Spans Cache operations emit spans: ``` concelier.cache.get ├── cache.key: "advisory:sha256:..." ├── cache.hit: true/false └── cache.latency_ms: 2.5 concelier.cache.set ├── cache.key: "advisory:sha256:..." └── cache.ttl_hours: 24 ``` ### Health Check ``` GET /health/cache ``` Response: ```json { "status": "healthy", "valkey_connected": true, "latency_ms": 1.2, "hot_set_size": 8542, "hit_rate_1h": 0.87 } ``` ## Performance ### Benchmarks | Operation | p50 | p95 | p99 | |-----------|-----|-----|-----| | GetAsync (hit) | 1.2ms | 3.5ms | 8.0ms | | GetAsync (miss + populate) | 12ms | 25ms | 45ms | | SetAsync | 1.5ms | 4.0ms | 9.0ms | | GetByPurlAsync | 2.5ms | 6.0ms | 15ms | | GetHotAsync(100) | 3.0ms | 8.0ms | 18ms | ### Optimization Tips 1. **Connection Pooling:** Use shared multiplexer with `PoolSize: 20` 2. **Pipeline Reads:** For bulk operations, use pipelining: ```csharp var batch = cache.CreateBatch(); foreach (var hash in mergeHashes) tasks.Add(batch.GetAsync(hash)); batch.Execute(); ``` 3. **Hot Set Preload:** Run warmup job on startup to preload hot set 4. **Compression:** Enable Valkey LZF compression for large advisories ## Fallback Behavior When Valkey is unavailable: 1. **FallbackToPostgres: true** (default) - All reads go directly to PostgreSQL - Performance degrades but system remains operational - Reconnection attempts continue in background 2. **FallbackToPostgres: false** - Cache misses return null/empty - Only cached data is accessible - Use for strict latency requirements ## Troubleshooting ### Common Issues | Issue | Cause | Solution | |-------|-------|----------| | High miss rate | Cache cold / insufficient TTL | Run warmup job, increase TTLs | | Latency spikes | Connection exhaustion | Increase PoolSize | | Memory pressure | Too many cached advisories | Reduce HotSetMaxSize, lower TTLs | | Index stale | Invalidation not triggered | Check event handlers, verify IndexPurlAsync calls | ### Debug Commands ```bash # Check cache stats stella cache stats # View hot set stella cache list-hot --limit 10 # Check specific advisory stella cache get sha256:mergehash... # Flush cache stella cache flush --confirm # Check PURL index stella cache lookup-purl pkg:npm/express@4.0.0 ``` ### Valkey CLI ```bash # Connect to Valkey redis-cli -h localhost -p 6379 # Check memory usage INFO memory # List hot set entries ZRANGE rank:hot 0 9 WITHSCORES # Check PURL index SMEMBERS by:purl:pkg:npm/express@4.0.0 # Get advisory GET advisory:sha256:a1b2c3... ```