Refactor code structure and optimize performance across multiple modules

2025-12-26 20:03:22 +02:00
parent c786faae84
commit b4fc66feb6
3353 changed files with 88254 additions and 1590657 deletions
--- a/docs/modules/advisory-ai/guides/offline-model-bundles.md
+++ b/docs/modules/advisory-ai/guides/offline-model-bundles.md
@@ -0,0 +1,278 @@
+# Offline AI Model Bundles
+
+> **Sprint:** SPRINT_20251226_019_AI_offline_inference
+> **Task:** OFFLINE-23, OFFLINE-26
+
+This guide covers transferring and configuring AI model bundles for air-gapped deployments.
+
+## Overview
+
+Local LLM inference in air-gapped environments requires model weight bundles to be transferred via sneakernet (USB, portable media, or internal package servers). The AdvisoryAI module supports deterministic local inference with signed model bundles.
+
+## Model Bundle Format
+
+```
+/offline/models/<model-id>/
+  ├── manifest.json           # Bundle metadata + file digests
+  ├── signature.dsse          # DSSE envelope with model signature
+  ├── weights/
+  │   ├── model.gguf          # Quantized weights (llama.cpp format)
+  │   └── model.gguf.sha256   # SHA-256 digest
+  ├── tokenizer/
+  │   ├── tokenizer.json      # Tokenizer config
+  │   └── special_tokens.json # Special tokens map
+  └── config/
+      ├── model_config.json   # Model architecture config
+      └── inference.json      # Recommended inference settings
+```
+
+## Manifest Schema
+
+```json
+{
+  "bundle_id": "llama3-8b-q4km-v1",
+  "model_family": "llama3",
+  "model_size": "8B",
+  "quantization": "Q4_K_M",
+  "license": "Apache-2.0",
+  "created_at": "2025-12-26T00:00:00Z",
+  "files": [
+    {
+      "path": "weights/model.gguf",
+      "digest": "sha256:a1b2c3d4e5f6...",
+      "size": 4893456789
+    },
+    {
+      "path": "tokenizer/tokenizer.json",
+      "digest": "sha256:1a2b3c4d5e6f...",
+      "size": 1842
+    }
+  ],
+  "crypto_scheme": "ed25519",
+  "signature_id": "ed25519-20251226-a1b2c3d4"
+}
+```
+
+## Transfer Workflow
+
+### 1. Export on Connected Machine
+
+```bash
+# Pull model from registry and create signed bundle
+stella model pull llama3-8b-q4km --offline --output /mnt/usb/models/
+
+# Verify bundle before transfer
+stella model verify /mnt/usb/models/llama3-8b-q4km/ --verbose
+```
+
+### 2. Transfer Verification
+
+Before physically transferring the media, verify the bundle integrity:
+
+```bash
+# Generate transfer manifest with all digests
+stella model export-manifest /mnt/usb/models/ --output transfer-manifest.json
+
+# Print weights digest for phone/radio verification
+sha256sum /mnt/usb/models/llama3-8b-q4km/weights/model.gguf
+# Example output: a1b2c3d4... model.gguf
+
+# Cross-check against manifest
+jq '.files[] | select(.path | contains("model.gguf")) | .digest' manifest.json
+```
+
+### 3. Import on Air-Gapped Host
+
+```bash
+# Import with signature verification
+stella model import /mnt/usb/models/llama3-8b-q4km/ \
+  --verify-signature \
+  --destination /var/lib/stellaops/models/
+
+# Verify loaded model matches expected digest
+stella model info llama3-8b-q4km --verify
+
+# List all installed models
+stella model list
+```
+
+## CLI Model Commands
+
+| Command | Description |
+|---------|-------------|
+| `stella model list` | List installed model bundles |
+| `stella model pull --offline` | Download bundle to local path for transfer |
+| `stella model verify <path>` | Verify bundle integrity and signature |
+| `stella model import <path>` | Import bundle from external media |
+| `stella model info <model-id>` | Display bundle details and verification status |
+| `stella model remove <model-id>` | Remove installed model bundle |
+
+### Command Examples
+
+```bash
+# List models with details
+stella model list --verbose
+
+# Pull specific model variant
+stella model pull llama3-8b --quantization Q4_K_M --offline --output ./bundle/
+
+# Verify all installed bundles
+stella model verify --all
+
+# Get model info including signature status
+stella model info llama3-8b-q4km --show-signature
+
+# Remove model bundle
+stella model remove llama3-8b-q4km --force
+```
+
+## Configuration
+
+### Local Inference Configuration
+
+Configure in `etc/advisory-ai.yaml`:
+
+```yaml
+advisoryAi:
+  inference:
+    mode: Local  # Local | Remote
+    local:
+      bundlePath: /var/lib/stellaops/models/llama3-8b-q4km
+      requiredDigest: "sha256:a1b2c3d4e5f6..."
+      verifySignature: true
+      deviceType: CPU  # CPU | GPU | NPU
+
+      # Determinism settings (required for replay)
+      contextLength: 4096
+      temperature: 0.0
+      seed: 42
+
+      # Performance tuning
+      threads: 4
+      batchSize: 512
+      gpuLayers: 0  # 0 = CPU only
+```
+
+### Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `ADVISORYAI_INFERENCE_MODE` | `Local` or `Remote` | `Local` |
+| `ADVISORYAI_MODEL_PATH` | Path to model bundle | `/var/lib/stellaops/models` |
+| `ADVISORYAI_MODEL_VERIFY` | Verify signature on load | `true` |
+| `ADVISORYAI_INFERENCE_THREADS` | CPU threads for inference | `4` |
+
+## Hardware Requirements
+
+| Model Size | Quantization | RAM Required | GPU VRAM | Inference Speed |
+|------------|--------------|--------------|----------|-----------------|
+| 7-8B | Q4_K_M | 8 GB | N/A (CPU) | ~10 tokens/sec |
+| 7-8B | FP16 | 16 GB | 8 GB | ~50 tokens/sec |
+| 13B | Q4_K_M | 16 GB | N/A (CPU) | ~5 tokens/sec |
+| 13B | FP16 | 32 GB | 16 GB | ~30 tokens/sec |
+
+### Recommended Configurations
+
+**Minimal (CPU-only, 8GB RAM):**
+- Model: Llama 3 8B Q4_K_M
+- Settings: `threads: 4`, `batchSize: 256`
+- Expected: ~10 tokens/sec
+
+**Standard (CPU, 16GB RAM):**
+- Model: Llama 3 8B Q4_K_M or 13B Q4_K_M
+- Settings: `threads: 8`, `batchSize: 512`
+- Expected: ~15-20 tokens/sec (8B), ~5-8 tokens/sec (13B)
+
+**GPU-Accelerated (8GB VRAM):**
+- Model: Llama 3 8B FP16
+- Settings: `gpuLayers: 35`, `batchSize: 512`
+- Expected: ~50 tokens/sec
+
+## Signing and Verification
+
+### Model Bundle Signing
+
+Bundles are signed using DSSE (Dead Simple Signing Envelope) format:
+
+```json
+{
+  "payloadType": "application/vnd.stellaops.model-bundle+json",
+  "payload": "<base64-encoded-manifest-digest>",
+  "signatures": [
+    {
+      "keyId": "stellaops-model-signer-2025",
+      "sig": "<base64-signature>"
+    }
+  ]
+}
+```
+
+### Regional Crypto Support
+
+| Region | Algorithm | Key Type |
+|--------|-----------|----------|
+| Default | Ed25519 | Ed25519 |
+| FIPS (US) | ECDSA-P256 | NIST P-256 |
+| GOST (RU) | GOST 34.10-2012 | GOST R 34.10-2012 |
+| SM (CN) | SM2 | SM2 |
+
+### Verification at Load Time
+
+When a model is loaded, the following checks occur:
+
+1. **Signature verification**: DSSE envelope is verified against known keys
+2. **Manifest integrity**: All file digests are recalculated and compared
+3. **Bundle completeness**: All required files are present
+4. **Configuration validation**: Inference settings are within safe bounds
+
+## Deterministic Inference
+
+For reproducible AI outputs (required for attestation replay):
+
+```yaml
+advisoryAi:
+  inference:
+    local:
+      # CRITICAL: These settings ensure deterministic output
+      temperature: 0.0
+      seed: 42
+      topK: 1
+      topP: 1.0
+```
+
+With these settings, the same prompt will produce identical output across runs, enabling:
+- AI artifact replay for compliance audits
+- Divergence detection between environments
+- Attestation verification
+
+## Benchmarking
+
+Run local inference benchmarks:
+
+```bash
+# Run standard benchmark suite
+stella model benchmark llama3-8b-q4km --iterations 10
+
+# Output includes:
+# - Latency: mean, median, p95, p99, TTFT
+# - Throughput: tokens/sec, requests/min
+# - Resource usage: peak memory, CPU utilization
+```
+
+## Troubleshooting
+
+| Symptom | Cause | Resolution |
+|---------|-------|------------|
+| `signature verification failed` | Bundle tampered or wrong key | Re-download bundle, verify chain of custody |
+| `digest mismatch` | Corrupted during transfer | Re-copy from source, verify SHA-256 |
+| `model not found` | Wrong bundle path | Check `bundlePath` in config |
+| `out of memory` | Model too large | Use smaller quantization (Q4_K_M) |
+| `inference timeout` | CPU too slow | Increase timeout or enable GPU |
+| `non-deterministic output` | Wrong settings | Set `temperature: 0`, `seed: 42` |
+
+## Related Documentation
+
+- [Advisory AI Architecture](../architecture.md)
+- [Offline Kit Overview](../../../24_OFFLINE_KIT.md)
+- [AI Attestations](../../../implplan/SPRINT_20251226_018_AI_attestations.md)
+- [Replay Semantics](./replay-semantics.md)