save checkpoint

2026-02-14 09:11:48 +02:00
parent 9ca2de05df
commit e9aeadc040
1512 changed files with 30863 additions and 4728 deletions
--- a/docs/features/checked/advisoryai/llm-inference-response-caching.md
+++ b/docs/features/checked/advisoryai/llm-inference-response-caching.md
@@ -0,0 +1,28 @@
+# LLM Inference Response Caching
+
+## Module
+AdvisoryAI
+
+## Status
+IMPLEMENTED
+
+## Description
+In-memory LLM inference cache that deduplicates identical prompt+model combinations. Reduces API costs and latency by caching deterministic responses keyed by content hash.
+
+## Implementation Details
+- **Modules**: `src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/`
+- **Key Classes**:
+  - `LlmInferenceCache` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmInferenceCache.cs`) - in-memory cache keyed by content hash of prompt+model+parameters
+  - `LlmProviderFactory` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmProviderFactory.cs`) - factory that wraps providers with caching layer
+  - `LlmProviderOptions` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmProviderOptions.cs`) - provider options including cache TTL and size limits
+  - `ProviderBasedAdvisoryInferenceClient` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ProviderBasedAdvisoryInferenceClient.cs`) - inference client that uses the caching layer
+- **Interfaces**: `ILlmProvider`
+- **Source**: SPRINT_20251226_019_AI_offline_inference.md
+
+## E2E Test Plan
+- [ ] Send identical prompts twice and verify `LlmInferenceCache` returns the cached response on the second call without hitting the LLM
+- [ ] Verify cache keys include model ID and parameters: same prompt with different temperature results in cache miss
+- [ ] Verify cache TTL: cached responses expire after configured duration
+- [ ] Verify cache size limits: when max entries are reached, oldest entries are evicted
+- [ ] Verify cache bypass: non-deterministic requests (temperature > 0) are not cached
+- [ ] Verify `ProviderBasedAdvisoryInferenceClient` correctly integrates caching with the provider pipeline