save checkpoint
This commit is contained in:
@@ -0,0 +1,28 @@
|
||||
# LLM Inference Response Caching
|
||||
|
||||
## Module
|
||||
AdvisoryAI
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
In-memory LLM inference cache that deduplicates identical prompt+model combinations. Reduces API costs and latency by caching deterministic responses keyed by content hash.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/`
|
||||
- **Key Classes**:
|
||||
- `LlmInferenceCache` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmInferenceCache.cs`) - in-memory cache keyed by content hash of prompt+model+parameters
|
||||
- `LlmProviderFactory` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmProviderFactory.cs`) - factory that wraps providers with caching layer
|
||||
- `LlmProviderOptions` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmProviderOptions.cs`) - provider options including cache TTL and size limits
|
||||
- `ProviderBasedAdvisoryInferenceClient` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ProviderBasedAdvisoryInferenceClient.cs`) - inference client that uses the caching layer
|
||||
- **Interfaces**: `ILlmProvider`
|
||||
- **Source**: SPRINT_20251226_019_AI_offline_inference.md
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Send identical prompts twice and verify `LlmInferenceCache` returns the cached response on the second call without hitting the LLM
|
||||
- [ ] Verify cache keys include model ID and parameters: same prompt with different temperature results in cache miss
|
||||
- [ ] Verify cache TTL: cached responses expire after configured duration
|
||||
- [ ] Verify cache size limits: when max entries are reached, oldest entries are evicted
|
||||
- [ ] Verify cache bypass: non-deterministic requests (temperature > 0) are not cached
|
||||
- [ ] Verify `ProviderBasedAdvisoryInferenceClient` correctly integrates caching with the provider pipeline
|
||||
Reference in New Issue
Block a user