semi implemented and features implemented save checkpoint

This commit is contained in:
master
2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions

View File

@@ -0,0 +1,28 @@
# LLM Inference Response Caching
## Module
AdvisoryAI
## Status
IMPLEMENTED
## Description
In-memory LLM inference cache that deduplicates identical prompt+model combinations. Reduces API costs and latency by caching deterministic responses keyed by content hash.
## Implementation Details
- **Modules**: `src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/`
- **Key Classes**:
- `LlmInferenceCache` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmInferenceCache.cs`) - in-memory cache keyed by content hash of prompt+model+parameters
- `LlmProviderFactory` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmProviderFactory.cs`) - factory that wraps providers with caching layer
- `LlmProviderOptions` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmProviders/LlmProviderOptions.cs`) - provider options including cache TTL and size limits
- `ProviderBasedAdvisoryInferenceClient` (`src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ProviderBasedAdvisoryInferenceClient.cs`) - inference client that uses the caching layer
- **Interfaces**: `ILlmProvider`
- **Source**: SPRINT_20251226_019_AI_offline_inference.md
## E2E Test Plan
- [ ] Send identical prompts twice and verify `LlmInferenceCache` returns the cached response on the second call without hitting the LLM
- [ ] Verify cache keys include model ID and parameters: same prompt with different temperature results in cache miss
- [ ] Verify cache TTL: cached responses expire after configured duration
- [ ] Verify cache size limits: when max entries are reached, oldest entries are evicted
- [ ] Verify cache bypass: non-deterministic requests (temperature > 0) are not cached
- [ ] Verify `ProviderBasedAdvisoryInferenceClient` correctly integrates caching with the provider pipeline