Files
git.stella-ops.org/docs/features/unchecked/advisoryai/sovereign-offline-ai-inference-with-signed-model-bundles.md

2.9 KiB

Sovereign/Offline AI Inference with Signed Model Bundles

Module

AdvisoryAI

Status

IMPLEMENTED

Description

Local LLM inference for air-gapped environments via a pluggable provider architecture supporting llama.cpp server, Ollama, OpenAI, Claude, and Gemini. DSSE-signed model bundle management with regional crypto support (eIDAS/FIPS/GOST/SM), digest verification at load time, deterministic output config (temperature=0, fixed seed), inference caching, benchmarking harness, and offline replay verification.

Implementation Details

  • Modules: src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/
  • Key Classes:
    • SignedModelBundleManager (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/SignedModelBundleManager.cs) - manages DSSE-signed model bundles with digest verification at load time
    • ModelBundle (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ModelBundle.cs) - model bundle metadata including hash, signature, and regional crypto info
    • LlamaCppRuntime (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlamaCppRuntime.cs) - llama.cpp local inference runtime
    • OnnxRuntime (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/OnnxRuntime.cs) - ONNX runtime for local model inference
    • AdvisoryInferenceClient (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/AdvisoryInferenceClient.cs) - main inference client with provider routing
    • ProviderBasedAdvisoryInferenceClient (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/ProviderBasedAdvisoryInferenceClient.cs) - provider-based inference with caching
    • LlmBenchmark (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LlmBenchmark.cs) - benchmarking harness for inference performance
    • LocalInferenceOptions (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LocalInferenceOptions.cs) - configuration for local inference (temperature, seed, context size)
    • LocalLlmConfig (src/AdvisoryAi/StellaOps.AdvisoryAI/Inference/LocalLlmConfig.cs) - local LLM configuration (model path, quantization, GPU layers)
    • LocalChatInferenceClient (src/AdvisoryAi/StellaOps.AdvisoryAI/Chat/Services/LocalChatInferenceClient.cs) - chat-specific local inference client
  • Interfaces: ILocalLlmRuntime
  • Source: SPRINT_20251226_019_AI_offline_inference.md

E2E Test Plan

  • Load a signed model bundle via SignedModelBundleManager and verify DSSE signature and digest are validated
  • Verify SignedModelBundleManager rejects a model bundle with a tampered digest
  • Run inference through LlamaCppRuntime with temperature=0 and fixed seed and verify deterministic output
  • Run LlmBenchmark and verify it measures tokens/second and latency metrics
  • Verify OnnxRuntime loads and runs inference with an ONNX model
  • Configure LocalInferenceOptions with air-gap settings and verify no external network calls are made
  • Verify ProviderBasedAdvisoryInferenceClient caches deterministic responses and returns cached results on repeat queries