Files

StellaOps Bot b4fc66feb6 Refactor code structure and optimize performance across multiple modules

2025-12-26 21:38:12 +02:00

13 KiB

Raw Blame History

LLM Provider Plugins

Sprint: SPRINT_20251226_019_AI_offline_inference Tasks: OFFLINE-07, OFFLINE-08, OFFLINE-09

This guide documents the LLM (Large Language Model) provider plugin architecture for AI-powered advisory analysis, explanations, and remediation planning.

Overview

StellaOps supports multiple LLM backends through a unified plugin architecture:

Provider	Type	Use Case	Priority
llama-server	Local	Airgap/Offline deployment	10 (highest)
ollama	Local	Development, edge deployment	20
openai	Cloud	GPT-4o for high-quality output	100
claude	Cloud	Claude Sonnet for complex reasoning	100

Architecture

Plugin Interface

public interface ILlmProviderPlugin : IAvailabilityPlugin
{
    string ProviderId { get; }              // "openai", "claude", "llama-server", "ollama"
    string DisplayName { get; }              // Human-readable name
    string Description { get; }              // Provider description
    string DefaultConfigFileName { get; }    // "openai.yaml", etc.

    ILlmProvider Create(IServiceProvider services, IConfiguration configuration);
    LlmProviderConfigValidation ValidateConfiguration(IConfiguration configuration);
}

Provider Interface

public interface ILlmProvider : IDisposable
{
    string ProviderId { get; }

    Task<bool> IsAvailableAsync(CancellationToken cancellationToken = default);

    Task<LlmCompletionResult> CompleteAsync(
        LlmCompletionRequest request,
        CancellationToken cancellationToken = default);

    IAsyncEnumerable<LlmStreamChunk> CompleteStreamAsync(
        LlmCompletionRequest request,
        CancellationToken cancellationToken = default);
}

Request and Response

public record LlmCompletionRequest
{
    string? SystemPrompt { get; init; }
    required string UserPrompt { get; init; }
    string? Model { get; init; }
    double Temperature { get; init; } = 0;      // 0 = deterministic
    int MaxTokens { get; init; } = 4096;
    int? Seed { get; init; }                     // For reproducibility
    IReadOnlyList<string>? StopSequences { get; init; }
    string? RequestId { get; init; }
}

public record LlmCompletionResult
{
    required string Content { get; init; }
    required string ModelId { get; init; }
    required string ProviderId { get; init; }
    int? InputTokens { get; init; }
    int? OutputTokens { get; init; }
    long? TotalTimeMs { get; init; }
    string? FinishReason { get; init; }
    bool Deterministic { get; init; }
}

Configuration

Directory Structure

etc/
  llm-providers/
    openai.yaml          # OpenAI configuration
    claude.yaml          # Claude/Anthropic configuration
    llama-server.yaml    # llama.cpp server configuration
    ollama.yaml          # Ollama configuration

Environment Variables

Variable	Provider	Description
`OPENAI_API_KEY`	OpenAI	API key for OpenAI
`ANTHROPIC_API_KEY`	Claude	API key for Anthropic

Priority System

Providers are selected by priority (lower = higher preference):

# llama-server.yaml - highest priority for offline
priority: 10

# ollama.yaml - second priority for local
priority: 20

# openai.yaml / claude.yaml - cloud fallback
priority: 100

Provider Details

OpenAI Provider

Supports OpenAI API and Azure OpenAI Service.

# etc/llm-providers/openai.yaml
enabled: true
priority: 100

api:
  apiKey: "${OPENAI_API_KEY}"
  baseUrl: "https://api.openai.com/v1"
  organizationId: ""
  apiVersion: ""  # Required for Azure OpenAI

model:
  name: "gpt-4o"
  fallbacks:
    - "gpt-4o-mini"

inference:
  temperature: 0.0
  maxTokens: 4096
  seed: 42
  topP: 1.0
  frequencyPenalty: 0.0
  presencePenalty: 0.0

request:
  timeout: "00:02:00"
  maxRetries: 3

Azure OpenAI Configuration:

api:
  baseUrl: "https://{resource}.openai.azure.com/openai/deployments/{deployment}"
  apiKey: "${AZURE_OPENAI_KEY}"
  apiVersion: "2024-02-15-preview"

Claude Provider

Supports Anthropic Claude API.

# etc/llm-providers/claude.yaml
enabled: true
priority: 100

api:
  apiKey: "${ANTHROPIC_API_KEY}"
  baseUrl: "https://api.anthropic.com"
  apiVersion: "2023-06-01"

model:
  name: "claude-sonnet-4-20250514"
  fallbacks:
    - "claude-3-5-sonnet-20241022"

inference:
  temperature: 0.0
  maxTokens: 4096
  topP: 1.0
  topK: 0

thinking:
  enabled: false
  budgetTokens: 10000

request:
  timeout: "00:02:00"
  maxRetries: 3

llama.cpp Server Provider

Primary provider for airgap/offline deployments.

# etc/llm-providers/llama-server.yaml
enabled: true
priority: 10  # Highest priority

server:
  baseUrl: "http://localhost:8080"
  apiKey: ""
  healthEndpoint: "/health"

model:
  name: "llama3-8b-q4km"
  modelPath: "/models/llama-3-8b-instruct.Q4_K_M.gguf"
  expectedDigest: "sha256:..."  # For airgap verification

inference:
  temperature: 0.0
  maxTokens: 4096
  seed: 42
  topP: 1.0
  topK: 40
  repeatPenalty: 1.1
  contextLength: 4096

bundle:
  bundlePath: "/bundles/llama3-8b.stellaops-model"
  verifySignature: true
  cryptoScheme: "ed25519"

request:
  timeout: "00:05:00"
  maxRetries: 2

Starting llama.cpp server:

# Basic server
llama-server -m model.gguf --host 0.0.0.0 --port 8080

# With GPU acceleration
llama-server -m model.gguf --host 0.0.0.0 --port 8080 -ngl 35

# With API key authentication
llama-server -m model.gguf --host 0.0.0.0 --port 8080 --api-key "your-key"

Ollama Provider

For local development and edge deployments.

# etc/llm-providers/ollama.yaml
enabled: true
priority: 20

server:
  baseUrl: "http://localhost:11434"
  healthEndpoint: "/api/tags"

model:
  name: "llama3:8b"
  fallbacks:
    - "mistral:7b"
  keepAlive: "5m"

inference:
  temperature: 0.0
  maxTokens: 4096
  seed: 42
  topP: 1.0
  topK: 40
  repeatPenalty: 1.1
  numCtx: 4096

gpu:
  numGpu: 0  # 0 = CPU only, -1 = all layers on GPU

management:
  autoPull: false  # Disable for airgap
  verifyPull: true

request:
  timeout: "00:05:00"
  maxRetries: 2

Usage

Dependency Injection

// Program.cs or Startup.cs
services.AddLlmProviderPlugins("etc/llm-providers");

// Or with explicit configuration
services.AddLlmProviderPlugins(catalog =>
{
    catalog.LoadConfigurationsFromDirectory("etc/llm-providers");
    // Optionally register custom plugins
    catalog.RegisterPlugin(new CustomLlmProviderPlugin());
});

Using the Provider Factory

public class AdvisoryExplanationService
{
    private readonly ILlmProviderFactory _providerFactory;

    public async Task<string> GenerateExplanationAsync(
        string vulnerabilityId,
        CancellationToken cancellationToken)
    {
        // Get the default (highest priority available) provider
        var provider = _providerFactory.GetDefaultProvider();

        var request = new LlmCompletionRequest
        {
            SystemPrompt = "You are a security analyst explaining vulnerabilities.",
            UserPrompt = $"Explain {vulnerabilityId} in plain language.",
            Temperature = 0,  // Deterministic
            Seed = 42,        // Reproducible
            MaxTokens = 2048
        };

        var result = await provider.CompleteAsync(request, cancellationToken);
        return result.Content;
    }
}

Provider Selection

// Get specific provider
var openaiProvider = _providerFactory.GetProvider("openai");
var claudeProvider = _providerFactory.GetProvider("claude");
var llamaProvider = _providerFactory.GetProvider("llama-server");

// List available providers
var available = _providerFactory.AvailableProviders;
// Returns: ["llama-server", "ollama", "openai", "claude"]

Automatic Fallback

// Create a fallback provider that tries providers in order
var fallbackProvider = new FallbackLlmProvider(
    _providerFactory,
    providerOrder: ["llama-server", "ollama", "openai", "claude"],
    _logger);

// Uses first available provider, falls back on failure
var result = await fallbackProvider.CompleteAsync(request, cancellationToken);

Streaming Responses

var provider = _providerFactory.GetDefaultProvider();

await foreach (var chunk in provider.CompleteStreamAsync(request, cancellationToken))
{
    Console.Write(chunk.Content);

    if (chunk.IsFinal)
    {
        Console.WriteLine($"\n[Finished: {chunk.FinishReason}]");
    }
}

Determinism Requirements

For reproducible AI outputs (required for attestations):

Setting	Value	Purpose
`temperature`	`0.0`	No randomness in token selection
`seed`	`42`	Fixed random seed
`topK`	`1`	Single token selection (optional)

inference:
  temperature: 0.0
  seed: 42
  topK: 1  # Most deterministic

Verification:

var result = await provider.CompleteAsync(request, cancellationToken);

if (!result.Deterministic)
{
    _logger.LogWarning("Output may not be reproducible");
}

Offline/Airgap Deployment

Recommended Configuration

etc/llm-providers/
  llama-server.yaml    # Primary - enabled, priority: 10
  ollama.yaml          # Backup - enabled, priority: 20
  openai.yaml          # Disabled or missing
  claude.yaml          # Disabled or missing

Model Bundle Verification

For airgap environments, use signed model bundles:

# llama-server.yaml
bundle:
  bundlePath: "/bundles/llama3-8b.stellaops-model"
  verifySignature: true
  cryptoScheme: "ed25519"

model:
  expectedDigest: "sha256:abc123..."

Creating a model bundle:

# Create signed bundle
stella model bundle \
  --model /models/llama-3-8b-instruct.Q4_K_M.gguf \
  --sign \
  --output /bundles/llama3-8b.stellaops-model

# Verify bundle
stella model verify /bundles/llama3-8b.stellaops-model

Custom Plugins

To add support for a new LLM provider:

public sealed class CustomLlmProviderPlugin : ILlmProviderPlugin
{
    public string Name => "Custom LLM Provider";
    public string ProviderId => "custom";
    public string DisplayName => "Custom LLM";
    public string Description => "Custom LLM backend";
    public string DefaultConfigFileName => "custom.yaml";

    public bool IsAvailable(IServiceProvider services) => true;

    public ILlmProvider Create(IServiceProvider services, IConfiguration configuration)
    {
        var config = CustomConfig.FromConfiguration(configuration);
        var httpClientFactory = services.GetRequiredService<IHttpClientFactory>();
        var logger = services.GetRequiredService<ILogger<CustomLlmProvider>>();
        return new CustomLlmProvider(httpClientFactory.CreateClient(), config, logger);
    }

    public LlmProviderConfigValidation ValidateConfiguration(IConfiguration configuration)
    {
        // Validate configuration
        return LlmProviderConfigValidation.Success();
    }
}

services.AddLlmProviderPlugins(catalog =>
{
    catalog.RegisterPlugin(new CustomLlmProviderPlugin());
    catalog.LoadConfigurationsFromDirectory("etc/llm-providers");
});

Telemetry

LLM operations emit structured logs:

{
  "timestamp": "2025-12-26T10:30:00Z",
  "operation": "llm_completion",
  "providerId": "llama-server",
  "model": "llama3-8b-q4km",
  "inputTokens": 1234,
  "outputTokens": 567,
  "totalTimeMs": 2345,
  "deterministic": true,
  "finishReason": "stop"
}

Performance Comparison

Provider	Latency (TTFT)	Throughput	Cost	Offline
llama-server	50-200ms	20-50 tok/s	Free	Yes
ollama	100-500ms	15-40 tok/s	Free	Yes
openai (gpt-4o)	200-500ms	50-100 tok/s	$$$	No
claude (sonnet)	300-600ms	40-80 tok/s	$$$	No

Note: Local performance depends heavily on hardware (GPU, RAM, CPU).

Troubleshooting

Provider Not Available

InvalidOperationException: No LLM providers are available.

Solutions:

Check configuration files exist in etc/llm-providers/
Verify API keys are set (environment variables or config)

For local providers, ensure server is running:

# llama-server
curl http://localhost:8080/health

# ollama
curl http://localhost:11434/api/tags

Non-Deterministic Output

Warning: Output may not be reproducible

Solutions:

Set temperature: 0.0 in configuration
Set seed: 42 (or any fixed value)
Use the same model version across environments

Timeout Errors

TaskCanceledException: The request was canceled due to timeout.

Solutions:

Increase request.timeout in configuration
For local inference, ensure sufficient hardware resources
Reduce maxTokens if appropriate

13 KiB Raw Blame History

LLM Provider Plugins

Overview

Architecture

Plugin Interface

Provider Interface

Request and Response

Configuration

Directory Structure

Environment Variables

Priority System

Provider Details

OpenAI Provider

Claude Provider

llama.cpp Server Provider

Ollama Provider

Usage

Dependency Injection

Using the Provider Factory

Provider Selection

Automatic Fallback

Streaming Responses

Determinism Requirements

Offline/Airgap Deployment

Recommended Configuration

Model Bundle Verification

Custom Plugins

Telemetry

Performance Comparison

Troubleshooting

Provider Not Available

Non-Deterministic Output

Timeout Errors

Related Documentation

13 KiB

Raw Blame History