Sovereign/Offline AI Inference with Signed Model Bundles

Module

AdvisoryAI

Status

IMPLEMENTED

Description

Local LLM inference for air-gapped environments via a pluggable provider architecture supporting llama.cpp server, Ollama, OpenAI, Claude, and Gemini. DSSE-signed model bundle management with regional crypto support (eIDAS/FIPS/GOST/SM), digest verification at load time, deterministic output config (temperature=0, fixed seed), inference caching, benchmarking harness, and offline replay verification.

Implementation Details

Modules: src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/
Key Classes:
- SignedModelBundleManager (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/SignedModelBundleManager.cs) - manages DSSE-signed model bundles with digest verification at load time
- ModelBundle (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ModelBundle.cs) - model bundle metadata including hash, signature, and regional crypto info
- LlamaCppRuntime (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlamaCppRuntime.cs) - llama.cpp local inference runtime
- OnnxRuntime (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/OnnxRuntime.cs) - ONNX runtime for local model inference
- AdvisoryInferenceClient (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/AdvisoryInferenceClient.cs) - main inference client with provider routing
- ProviderBasedAdvisoryInferenceClient (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ProviderBasedAdvisoryInferenceClient.cs) - provider-based inference with caching
- LlmBenchmark (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmBenchmark.cs) - benchmarking harness for inference performance
- LocalInferenceOptions (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LocalInferenceOptions.cs) - configuration for local inference (temperature, seed, context size)
- LocalLlmConfig (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LocalLlmConfig.cs) - local LLM configuration (model path, quantization, GPU layers)
- LocalChatInferenceClient (src/AdvisoryAi/StellaOps.AdvisoryAI/Chat/Services/LocalChatInferenceClient.cs) - chat-specific local inference client
Interfaces: ILocalLlmRuntime
Source: SPRINT_20251226_019_AI_offline_inference.md

E2E Test Plan

Load a signed model bundle via SignedModelBundleManager and verify DSSE signature and digest are validated
Verify SignedModelBundleManager rejects a model bundle with a tampered digest
Run inference through LlamaCppRuntime with temperature=0 and fixed seed and verify deterministic output
Run LlmBenchmark and verify it measures tokens/second and latency metrics
Verify OnnxRuntime loads and runs inference with an ONNX model
Configure LocalInferenceOptions with air-gap settings and verify no external network calls are made
Verify ProviderBasedAdvisoryInferenceClient caches deterministic responses and returns cached results on repeat queries

2.9 KiB Raw Blame History