2.9 KiB
2.9 KiB
Sovereign/Offline AI Inference with Signed Model Bundles
Module
AdvisoryAI
Status
IMPLEMENTED
Description
Local LLM inference for air-gapped environments via a pluggable provider architecture supporting llama.cpp server, Ollama, OpenAI, Claude, and Gemini. DSSE-signed model bundle management with regional crypto support (eIDAS/FIPS/GOST/SM), digest verification at load time, deterministic output config (temperature=0, fixed seed), inference caching, benchmarking harness, and offline replay verification.
Implementation Details
- Modules:
src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ - Key Classes:
SignedModelBundleManager(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/SignedModelBundleManager.cs) - manages DSSE-signed model bundles with digest verification at load timeModelBundle(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ModelBundle.cs) - model bundle metadata including hash, signature, and regional crypto infoLlamaCppRuntime(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlamaCppRuntime.cs) - llama.cpp local inference runtimeOnnxRuntime(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/OnnxRuntime.cs) - ONNX runtime for local model inferenceAdvisoryInferenceClient(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/AdvisoryInferenceClient.cs) - main inference client with provider routingProviderBasedAdvisoryInferenceClient(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ProviderBasedAdvisoryInferenceClient.cs) - provider-based inference with cachingLlmBenchmark(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmBenchmark.cs) - benchmarking harness for inference performanceLocalInferenceOptions(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LocalInferenceOptions.cs) - configuration for local inference (temperature, seed, context size)LocalLlmConfig(src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LocalLlmConfig.cs) - local LLM configuration (model path, quantization, GPU layers)LocalChatInferenceClient(src/AdvisoryAi/StellaOps.AdvisoryAI/Chat/Services/LocalChatInferenceClient.cs) - chat-specific local inference client
- Interfaces:
ILocalLlmRuntime - Source: SPRINT_20251226_019_AI_offline_inference.md
E2E Test Plan
- Load a signed model bundle via
SignedModelBundleManagerand verify DSSE signature and digest are validated - Verify
SignedModelBundleManagerrejects a model bundle with a tampered digest - Run inference through
LlamaCppRuntimewith temperature=0 and fixed seed and verify deterministic output - Run
LlmBenchmarkand verify it measures tokens/second and latency metrics - Verify
OnnxRuntimeloads and runs inference with an ONNX model - Configure
LocalInferenceOptionswith air-gap settings and verify no external network calls are made - Verify
ProviderBasedAdvisoryInferenceClientcaches deterministic responses and returns cached results on repeat queries