Refactor code structure and optimize performance across multiple modules

This commit is contained in:
StellaOps Bot
2025-12-26 20:03:22 +02:00
parent c786faae84
commit f10d83c444
1385 changed files with 69732 additions and 10280 deletions

View File

@@ -0,0 +1,373 @@
# AI Attestations and Replay Semantics
> **Sprint:** SPRINT_20251226_018_AI_attestations
> **Task:** AIATTEST-23
This guide documents the AI attestation schemas, authority classification, and deterministic replay semantics.
## Overview
AI-generated artifacts in StellaOps are wrapped in cryptographic attestations that:
1. Capture the exact inputs (prompts, context, model parameters)
2. Prove the generation chain (model ID, weights digest, configuration)
3. Enable deterministic replay for compliance verification
4. Support divergence detection across environments
## Attestation Types
### AI Artifact Predicate
```json
{
"_type": "https://stellaops.org/attestation/ai-artifact/v1",
"artifactId": "ai-artifact-20251226-001",
"artifactType": "explanation",
"authority": "ai-generated",
"generatedAt": "2025-12-26T10:30:00Z",
"model": {
"modelId": "llama3-8b-q4km",
"weightsDigest": "sha256:a1b2c3...",
"promptTemplateVersion": "v2.1.0"
},
"inputs": {
"systemPromptHash": "sha256:abc123...",
"userPromptHash": "sha256:def456...",
"contextHashes": ["sha256:111...", "sha256:222..."]
},
"parameters": {
"temperature": 0.0,
"seed": 42,
"maxTokens": 2048,
"topK": 1
},
"output": {
"contentHash": "sha256:789xyz...",
"tokenCount": 847
},
"replayManifest": {
"manifestId": "replay-20251226-001",
"manifestHash": "sha256:manifest..."
}
}
```
### Artifact Types
| Type | Description | Authority |
|------|-------------|-----------|
| `explanation` | Vulnerability explanation for humans | `ai-generated` |
| `remediation` | Fix plan with upgrade paths | `ai-generated` |
| `vex_draft` | Draft VEX statement | `ai-draft-requires-review` |
| `policy_draft` | Draft policy rules | `ai-draft-requires-review` |
| `triage_suggestion` | Triage action suggestions | `ai-suggestion` |
### Authority Classification
AI outputs are classified by their authority level:
```
ai-generated → Informational only, human review optional
ai-draft-requires-review → Draft requires explicit human approval
ai-suggestion → Suggestion, user decides action
ai-verified → AI output verified against ground truth
human-approved → AI output approved by human reviewer
```
## Replay Manifest
The replay manifest captures everything needed to reproduce an AI generation:
```json
{
"manifestVersion": "1.0",
"artifactId": "ai-artifact-20251226-001",
"artifactType": "explanation",
"model": {
"modelId": "llama3-8b-q4km",
"weightsDigest": "sha256:a1b2c3d4e5f6...",
"promptTemplateVersion": "v2.1.0"
},
"prompts": {
"systemPrompt": "You are a security analyst...",
"userPrompt": "Explain CVE-2024-1234 affecting lodash@4.17.20...",
"systemPromptHash": "sha256:abc123...",
"userPromptHash": "sha256:def456..."
},
"context": {
"contextPack": [...],
"contextHashes": ["sha256:111...", "sha256:222..."]
},
"parameters": {
"temperature": 0.0,
"seed": 42,
"maxTokens": 2048,
"topK": 1,
"topP": 1.0
},
"output": {
"content": "CVE-2024-1234 is a critical vulnerability...",
"contentHash": "sha256:789xyz...",
"tokenCount": 847
},
"metadata": {
"generatedAt": "2025-12-26T10:30:00Z",
"replayable": true,
"deterministicSettings": true
}
}
```
## Deterministic Requirements
For an AI artifact to be replayable:
1. **Temperature must be 0**: No randomness in token selection
2. **Seed must be fixed**: Same seed across replays (default: 42)
3. **Model weights must match**: Verified by weights digest
4. **Prompts must match**: Verified by prompt hashes
5. **Context must match**: All input hashes must verify
### Configuration for Determinism
```yaml
advisoryAi:
attestations:
requireDeterminism: true
defaultSeed: 42
inference:
local:
temperature: 0.0
seed: 42
topK: 1
topP: 1.0
```
## Replay Workflow
### Replay Execution
```csharp
// Load replay manifest
var manifest = await LoadManifestAsync("replay-20251226-001.json");
// Create replayer with same model
var replayer = replayerFactory.Create(manifest.Model.ModelId);
// Execute replay
var result = await replayer.ReplayAsync(manifest, cancellationToken);
// Check if output is identical
if (result.Identical)
{
Console.WriteLine("Replay successful: output matches original");
}
else
{
Console.WriteLine($"Divergence detected: similarity = {result.SimilarityScore:P2}");
}
```
### Divergence Detection
When replay produces different output:
```json
{
"diverged": true,
"similarityScore": 0.97,
"originalHash": "sha256:789xyz...",
"replayedHash": "sha256:different...",
"details": [
{
"type": "content_divergence",
"description": "Content differs at position",
"position": 1842,
"originalSnippet": "...vulnerability allows...",
"replayedSnippet": "...vulnerability permits..."
}
]
}
```
### Common Divergence Causes
| Cause | Detection | Resolution |
|-------|-----------|------------|
| Different model weights | Weights digest mismatch | Use exact model version |
| Non-zero temperature | Parameter check | Set temperature to 0 |
| Different seed | Parameter check | Use same seed |
| Prompt template change | Template version mismatch | Pin template version |
| Context ordering | Context hash mismatch | Sort context deterministically |
## Attestation Signing
### DSSE Envelope Format
AI attestations use DSSE (Dead Simple Signing Envelope):
```json
{
"payloadType": "application/vnd.stellaops.ai-attestation+json",
"payload": "<base64-encoded-attestation>",
"signatures": [
{
"keyId": "stellaops-ai-signer-2025",
"sig": "<base64-signature>"
}
]
}
```
### Signing Configuration
```yaml
advisoryAi:
attestations:
sign: true
keyId: "stellaops-ai-signer-2025"
cryptoScheme: ed25519 # ed25519 | ecdsa-p256 | gost3410 | sm2
```
## API Endpoints
### Generate with Attestation
```http
POST /api/v1/advisory/explain
Content-Type: application/json
{
"findingId": "finding-123",
"artifactDigest": "sha256:...",
"options": {
"generateAttestation": true,
"signAttestation": true
}
}
```
Response includes:
```json
{
"explanation": "...",
"attestation": {
"predicateType": "https://stellaops.org/attestation/ai-artifact/v1",
"predicate": {...},
"signature": {...}
},
"replayManifestId": "replay-20251226-001"
}
```
### Verify Attestation
```http
POST /api/v1/attestation/verify
Content-Type: application/json
{
"attestation": {...},
"options": {
"verifySignature": true,
"verifyReplay": true
}
}
```
### Replay Artifact
```http
POST /api/v1/advisory/replay
Content-Type: application/json
{
"manifestId": "replay-20251226-001"
}
```
## CLI Commands
```bash
# Generate explanation with attestation
stella advisory explain finding-123 --attest --sign
# Verify attestation
stella attest verify ai-artifact-20251226-001.dsse.json
# Replay from manifest
stella advisory replay --manifest replay-20251226-001.json
# Check divergence
stella advisory replay --manifest replay-20251226-001.json --detect-divergence
```
## Storage and Retrieval
### Attestation Storage
Attestations are stored in the Evidence Locker:
```
/evidence/ai-attestations/
├── 2025/12/26/
│ ├── ai-artifact-20251226-001.json
│ ├── ai-artifact-20251226-001.dsse.json
│ └── replay-20251226-001.json
```
### Retrieval
```http
GET /api/v1/attestation/ai-artifact-20251226-001
# Returns attestation + replay manifest
```
## Audit Trail
AI operations are logged for compliance:
```json
{
"timestamp": "2025-12-26T10:30:00Z",
"operation": "ai_generation",
"artifactId": "ai-artifact-20251226-001",
"artifactType": "explanation",
"modelId": "llama3-8b-q4km",
"authority": "ai-generated",
"user": "system",
"inputHashes": ["sha256:..."],
"outputHash": "sha256:...",
"signed": true,
"replayable": true
}
```
## Integration with VEX
AI-drafted VEX statements require human approval:
```mermaid
graph LR
A[AI generates VEX draft] --> B[Authority: ai-draft-requires-review]
B --> C[Human reviews draft]
C --> D{Approve?}
D -->|Yes| E[Authority: human-approved]
D -->|No| F[Draft rejected]
E --> G[Publish VEX]
```
## Related Documentation
- [Advisory AI Architecture](../architecture.md)
- [Offline Model Bundles](./offline-model-bundles.md)
- [Attestor Module](../../attestor/architecture.md)
- [Evidence Locker](../../evidence-locker/architecture.md)

View File

@@ -0,0 +1,397 @@
# Explanation API and Replay Semantics
> **Sprint:** SPRINT_20251226_015_AI_zastava_companion
> **Task:** ZASTAVA-21
This guide documents the Zastava Companion explanation API, attestation format, and replay semantics for evidence-grounded AI explanations.
## Overview
The Explanation API provides evidence-anchored explanations answering:
- **What** is this vulnerability?
- **Why** does it matter in this context?
- **Evidence**: What supports exploitability?
- **Counterfactual**: What would change the verdict?
All explanations are anchored to verifiable evidence nodes (SBOM, reachability, runtime, VEX, patches).
## Explanation Types
| Type | Purpose | Example Output |
|------|---------|----------------|
| `What` | Technical description | "CVE-2024-1234 is a remote code execution vulnerability in lodash's merge function..." |
| `Why` | Contextual relevance | "This matters because your service uses lodash@4.17.20 in the request handler path..." |
| `Evidence` | Exploitability proof | "Reachability analysis shows the vulnerable function is called from /api/users endpoint..." |
| `Counterfactual` | Verdict change conditions | "The verdict would change to 'not affected' if the VEX statement confirmed non-exploitability..." |
| `Full` | Comprehensive explanation | All of the above in a structured format |
## API Endpoints
### Generate Explanation
```http
POST /api/v1/advisory-ai/explain
Content-Type: application/json
{
"findingId": "finding-abc123",
"artifactDigest": "sha256:abcdef...",
"scope": "service",
"scopeId": "payment-service",
"explanationType": "Full",
"vulnerabilityId": "CVE-2024-1234",
"componentPurl": "pkg:npm/lodash@4.17.20",
"plainLanguage": true,
"maxLength": 2000
}
```
**Response:**
```json
{
"explanationId": "expl-20251226-001",
"content": "## What is CVE-2024-1234?\n\nCVE-2024-1234 is a critical remote code execution vulnerability...[1]\n\n## Why It Matters\n\nYour payment-service uses lodash@4.17.20 which is affected...[2]\n\n## Evidence\n\n- Reachability: The vulnerable `merge()` function is called from `/api/checkout`...[3]\n- Runtime: No WAF protection detected for this endpoint...[4]\n\n## What Would Change the Verdict\n\nThe verdict would change to 'not affected' if:\n- A VEX statement confirms non-exploitability...[5]\n- The function call is removed from the code path...[6]",
"summary": {
"line1": "Critical RCE in lodash affecting payment-service",
"line2": "Reachable via /api/checkout with no WAF protection",
"line3": "Upgrade to lodash@4.17.21 or add VEX exception"
},
"citations": [
{
"claimText": "CVE-2024-1234 is a critical remote code execution vulnerability",
"evidenceId": "nvd:CVE-2024-1234",
"evidenceType": "advisory",
"verified": true,
"evidenceExcerpt": "CVSS: 9.8 CRITICAL - Improper input validation in lodash merge..."
},
{
"claimText": "payment-service uses lodash@4.17.20",
"evidenceId": "sbom:payment-service:lodash@4.17.20",
"evidenceType": "sbom",
"verified": true,
"evidenceExcerpt": "Component: lodash, Version: 4.17.20, Location: node_modules/lodash"
},
{
"claimText": "vulnerable merge() function is called from /api/checkout",
"evidenceId": "reach:payment-service:lodash.merge:/api/checkout",
"evidenceType": "reachability",
"verified": true,
"evidenceExcerpt": "Call path: checkout.js:42 -> utils.js:15 -> lodash.merge()"
}
],
"confidenceScore": 0.92,
"citationRate": 0.85,
"authority": "EvidenceBacked",
"evidenceRefs": [
"nvd:CVE-2024-1234",
"sbom:payment-service:lodash@4.17.20",
"reach:payment-service:lodash.merge:/api/checkout",
"runtime:payment-service:waf:none"
],
"modelId": "claude-sonnet-4-20250514",
"promptTemplateVersion": "v2.1.0",
"inputHashes": [
"sha256:abc123...",
"sha256:def456..."
],
"generatedAt": "2025-12-26T10:30:00Z",
"outputHash": "sha256:789xyz..."
}
```
### Replay Explanation
Re-runs the explanation with identical inputs to verify determinism.
```http
GET /api/v1/advisory-ai/explain/{explanationId}/replay
```
**Response:**
```json
{
"original": { "...original explanation..." },
"replayed": { "...replayed explanation..." },
"identical": true,
"similarity": 1.0,
"divergenceDetails": null
}
```
### Get Explanation
```http
GET /api/v1/advisory-ai/explain/{explanationId}
```
### Validate Explanation
```http
POST /api/v1/advisory-ai/explain/{explanationId}/validate
```
Validates that the explanation's input hashes still match current evidence.
## Evidence Types
| Type | Source | Description |
|------|--------|-------------|
| `advisory` | NVD, GHSA, vendor | Vulnerability advisory data |
| `sbom` | Container scan | Software bill of materials component |
| `reachability` | Call graph analysis | Function reachability proof |
| `runtime` | Signals service | Runtime observations (WAF, network) |
| `vex` | VEX documents | Vendor exploitability statements |
| `patch` | Package registry | Available fix information |
## Authority Classification
Explanations are classified by their evidence backing:
| Authority | Criteria | Display |
|-----------|----------|---------|
| `EvidenceBacked` | ≥80% citation rate, all citations verified | Green badge: "Evidence-backed" |
| `Suggestion` | <80% citation rate or unverified citations | Yellow badge: "AI suggestion" |
```csharp
public enum ExplanationAuthority
{
EvidenceBacked, // All claims anchored to verified evidence
Suggestion // AI suggestion requiring human review
}
```
## Attestation Format
Explanations are wrapped in DSSE (Dead Simple Signing Envelope) attestations:
### Predicate Type
```
https://stellaops.org/attestation/ai-explanation/v1
```
### Predicate Schema
```json
{
"_type": "https://stellaops.org/attestation/ai-explanation/v1",
"explanationId": "expl-20251226-001",
"explanationType": "Full",
"authority": "EvidenceBacked",
"finding": {
"findingId": "finding-abc123",
"vulnerabilityId": "CVE-2024-1234",
"componentPurl": "pkg:npm/lodash@4.17.20"
},
"model": {
"modelId": "claude-sonnet-4-20250514",
"promptTemplateVersion": "v2.1.0"
},
"inputs": {
"inputHashes": ["sha256:abc123...", "sha256:def456..."],
"evidenceRefs": ["nvd:CVE-2024-1234", "sbom:..."]
},
"output": {
"contentHash": "sha256:789xyz...",
"confidenceScore": 0.92,
"citationRate": 0.85,
"citationCount": 6
},
"generatedAt": "2025-12-26T10:30:00Z"
}
```
### DSSE Envelope
```json
{
"payloadType": "application/vnd.stellaops.ai-explanation+json",
"payload": "<base64-encoded-predicate>",
"signatures": [
{
"keyId": "stellaops-ai-signer-2025",
"sig": "<base64-signature>"
}
]
}
```
### OCI Attachment
Attestations are pushed as OCI referrers:
```
Artifact: sha256:imagedigest
└── Referrer: application/vnd.stellaops.ai-explanation+json
└── expl-20251226-001.dsse.json
```
## Replay Semantics
### Replay Manifest
Every explanation includes a replay manifest enabling deterministic reproduction:
```json
{
"manifestVersion": "1.0",
"explanationId": "expl-20251226-001",
"model": {
"modelId": "claude-sonnet-4-20250514",
"weightsDigest": "sha256:modelweights...",
"promptTemplateVersion": "v2.1.0"
},
"inputs": {
"findingId": "finding-abc123",
"artifactDigest": "sha256:abcdef...",
"evidenceHashes": {
"advisory": "sha256:111...",
"sbom": "sha256:222...",
"reachability": "sha256:333..."
}
},
"parameters": {
"temperature": 0.0,
"seed": 42,
"maxTokens": 4096
},
"output": {
"contentHash": "sha256:789xyz...",
"generatedAt": "2025-12-26T10:30:00Z"
}
}
```
### Determinism Requirements
For replay to produce identical output:
| Parameter | Required Value | Purpose |
|-----------|---------------|---------|
| `temperature` | `0.0` | No randomness in generation |
| `seed` | `42` (fixed) | Reproducible sampling |
| `maxTokens` | Same as original | Consistent truncation |
| Model version | Exact match | Same weights |
| Prompt template | Exact match | Same prompt structure |
### Divergence Detection
When replay produces different output:
```json
{
"diverged": true,
"similarity": 0.94,
"originalHash": "sha256:789xyz...",
"replayedHash": "sha256:different...",
"divergencePoints": [
{
"position": 1234,
"original": "...uses lodash@4.17.20...",
"replayed": "...uses lodash version 4.17.20..."
}
],
"likelyCause": "model_update"
}
```
### Divergence Causes
| Cause | Detection | Resolution |
|-------|-----------|------------|
| Model update | Weights digest mismatch | Pin model version |
| Non-zero temperature | Parameter check | Set temperature=0 |
| Evidence change | Input hash mismatch | Re-generate explanation |
| Prompt template change | Template version mismatch | Pin template version |
## CLI Commands
```bash
# Generate explanation
stella advisory explain finding-abc123 \
--type full \
--plain-language \
--attest --sign
# Replay explanation
stella advisory replay expl-20251226-001
# Verify explanation attestation
stella attest verify expl-20251226-001.dsse.json
# Check for divergence
stella advisory replay expl-20251226-001 --detect-divergence
```
## Configuration
```yaml
advisoryAi:
explanation:
# Default explanation type
defaultType: Full
# Plain language by default
plainLanguage: true
# Maximum explanation length
maxLength: 4000
# Minimum citation rate for EvidenceBacked authority
minCitationRate: 0.80
# Generate attestation for each explanation
generateAttestation: true
# Sign attestations
signAttestation: true
# Determinism settings for replay
inference:
temperature: 0.0
seed: 42
maxTokens: 4096
```
## 3-Line Summary Format
Every explanation includes a 3-line summary following the AI UX pattern:
| Line | Purpose | Example |
|------|---------|---------|
| Line 1 | What changed / what is it | "Critical RCE in lodash affecting payment-service" |
| Line 2 | Why it matters | "Reachable via /api/checkout with no WAF protection" |
| Line 3 | Next action | "Upgrade to lodash@4.17.21 or add VEX exception" |
## Error Handling
### Generation Errors
```json
{
"error": "evidence_retrieval_failed",
"message": "Unable to retrieve SBOM for artifact sha256:abc...",
"recoverable": true,
"suggestion": "Ensure the artifact has been scanned before requesting explanation"
}
```
### Validation Errors
```json
{
"error": "citation_verification_failed",
"message": "Citation [2] references evidence that no longer exists",
"invalidCitations": ["sbom:payment-service:lodash@4.17.20"],
"suggestion": "Re-generate explanation with current evidence"
}
```
## Related Documentation
- [AI Attestations](./ai-attestations.md)
- [LLM Provider Plugins](./llm-provider-plugins.md)
- [Offline Model Bundles](./offline-model-bundles.md)
- [Advisory AI Architecture](../architecture.md)

View File

@@ -0,0 +1,560 @@
# LLM Provider Plugins
> **Sprint:** SPRINT_20251226_019_AI_offline_inference
> **Tasks:** OFFLINE-07, OFFLINE-08, OFFLINE-09
This guide documents the LLM (Large Language Model) provider plugin architecture for AI-powered advisory analysis, explanations, and remediation planning.
## Overview
StellaOps supports multiple LLM backends through a unified plugin architecture:
| Provider | Type | Use Case | Priority |
|----------|------|----------|----------|
| **llama-server** | Local | Airgap/Offline deployment | 10 (highest) |
| **ollama** | Local | Development, edge deployment | 20 |
| **openai** | Cloud | GPT-4o for high-quality output | 100 |
| **claude** | Cloud | Claude Sonnet for complex reasoning | 100 |
## Architecture
### Plugin Interface
```csharp
public interface ILlmProviderPlugin : IAvailabilityPlugin
{
string ProviderId { get; } // "openai", "claude", "llama-server", "ollama"
string DisplayName { get; } // Human-readable name
string Description { get; } // Provider description
string DefaultConfigFileName { get; } // "openai.yaml", etc.
ILlmProvider Create(IServiceProvider services, IConfiguration configuration);
LlmProviderConfigValidation ValidateConfiguration(IConfiguration configuration);
}
```
### Provider Interface
```csharp
public interface ILlmProvider : IDisposable
{
string ProviderId { get; }
Task<bool> IsAvailableAsync(CancellationToken cancellationToken = default);
Task<LlmCompletionResult> CompleteAsync(
LlmCompletionRequest request,
CancellationToken cancellationToken = default);
IAsyncEnumerable<LlmStreamChunk> CompleteStreamAsync(
LlmCompletionRequest request,
CancellationToken cancellationToken = default);
}
```
### Request and Response
```csharp
public record LlmCompletionRequest
{
string? SystemPrompt { get; init; }
required string UserPrompt { get; init; }
string? Model { get; init; }
double Temperature { get; init; } = 0; // 0 = deterministic
int MaxTokens { get; init; } = 4096;
int? Seed { get; init; } // For reproducibility
IReadOnlyList<string>? StopSequences { get; init; }
string? RequestId { get; init; }
}
public record LlmCompletionResult
{
required string Content { get; init; }
required string ModelId { get; init; }
required string ProviderId { get; init; }
int? InputTokens { get; init; }
int? OutputTokens { get; init; }
long? TotalTimeMs { get; init; }
string? FinishReason { get; init; }
bool Deterministic { get; init; }
}
```
## Configuration
### Directory Structure
```
etc/
llm-providers/
openai.yaml # OpenAI configuration
claude.yaml # Claude/Anthropic configuration
llama-server.yaml # llama.cpp server configuration
ollama.yaml # Ollama configuration
```
### Environment Variables
| Variable | Provider | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | OpenAI | API key for OpenAI |
| `ANTHROPIC_API_KEY` | Claude | API key for Anthropic |
### Priority System
Providers are selected by priority (lower = higher preference):
```yaml
# llama-server.yaml - highest priority for offline
priority: 10
# ollama.yaml - second priority for local
priority: 20
# openai.yaml / claude.yaml - cloud fallback
priority: 100
```
## Provider Details
### OpenAI Provider
Supports OpenAI API and Azure OpenAI Service.
```yaml
# etc/llm-providers/openai.yaml
enabled: true
priority: 100
api:
apiKey: "${OPENAI_API_KEY}"
baseUrl: "https://api.openai.com/v1"
organizationId: ""
apiVersion: "" # Required for Azure OpenAI
model:
name: "gpt-4o"
fallbacks:
- "gpt-4o-mini"
inference:
temperature: 0.0
maxTokens: 4096
seed: 42
topP: 1.0
frequencyPenalty: 0.0
presencePenalty: 0.0
request:
timeout: "00:02:00"
maxRetries: 3
```
**Azure OpenAI Configuration:**
```yaml
api:
baseUrl: "https://{resource}.openai.azure.com/openai/deployments/{deployment}"
apiKey: "${AZURE_OPENAI_KEY}"
apiVersion: "2024-02-15-preview"
```
### Claude Provider
Supports Anthropic Claude API.
```yaml
# etc/llm-providers/claude.yaml
enabled: true
priority: 100
api:
apiKey: "${ANTHROPIC_API_KEY}"
baseUrl: "https://api.anthropic.com"
apiVersion: "2023-06-01"
model:
name: "claude-sonnet-4-20250514"
fallbacks:
- "claude-3-5-sonnet-20241022"
inference:
temperature: 0.0
maxTokens: 4096
topP: 1.0
topK: 0
thinking:
enabled: false
budgetTokens: 10000
request:
timeout: "00:02:00"
maxRetries: 3
```
### llama.cpp Server Provider
**Primary provider for airgap/offline deployments.**
```yaml
# etc/llm-providers/llama-server.yaml
enabled: true
priority: 10 # Highest priority
server:
baseUrl: "http://localhost:8080"
apiKey: ""
healthEndpoint: "/health"
model:
name: "llama3-8b-q4km"
modelPath: "/models/llama-3-8b-instruct.Q4_K_M.gguf"
expectedDigest: "sha256:..." # For airgap verification
inference:
temperature: 0.0
maxTokens: 4096
seed: 42
topP: 1.0
topK: 40
repeatPenalty: 1.1
contextLength: 4096
bundle:
bundlePath: "/bundles/llama3-8b.stellaops-model"
verifySignature: true
cryptoScheme: "ed25519"
request:
timeout: "00:05:00"
maxRetries: 2
```
**Starting llama.cpp server:**
```bash
# Basic server
llama-server -m model.gguf --host 0.0.0.0 --port 8080
# With GPU acceleration
llama-server -m model.gguf --host 0.0.0.0 --port 8080 -ngl 35
# With API key authentication
llama-server -m model.gguf --host 0.0.0.0 --port 8080 --api-key "your-key"
```
### Ollama Provider
For local development and edge deployments.
```yaml
# etc/llm-providers/ollama.yaml
enabled: true
priority: 20
server:
baseUrl: "http://localhost:11434"
healthEndpoint: "/api/tags"
model:
name: "llama3:8b"
fallbacks:
- "mistral:7b"
keepAlive: "5m"
inference:
temperature: 0.0
maxTokens: 4096
seed: 42
topP: 1.0
topK: 40
repeatPenalty: 1.1
numCtx: 4096
gpu:
numGpu: 0 # 0 = CPU only, -1 = all layers on GPU
management:
autoPull: false # Disable for airgap
verifyPull: true
request:
timeout: "00:05:00"
maxRetries: 2
```
## Usage
### Dependency Injection
```csharp
// Program.cs or Startup.cs
services.AddLlmProviderPlugins("etc/llm-providers");
// Or with explicit configuration
services.AddLlmProviderPlugins(catalog =>
{
catalog.LoadConfigurationsFromDirectory("etc/llm-providers");
// Optionally register custom plugins
catalog.RegisterPlugin(new CustomLlmProviderPlugin());
});
```
### Using the Provider Factory
```csharp
public class AdvisoryExplanationService
{
private readonly ILlmProviderFactory _providerFactory;
public async Task<string> GenerateExplanationAsync(
string vulnerabilityId,
CancellationToken cancellationToken)
{
// Get the default (highest priority available) provider
var provider = _providerFactory.GetDefaultProvider();
var request = new LlmCompletionRequest
{
SystemPrompt = "You are a security analyst explaining vulnerabilities.",
UserPrompt = $"Explain {vulnerabilityId} in plain language.",
Temperature = 0, // Deterministic
Seed = 42, // Reproducible
MaxTokens = 2048
};
var result = await provider.CompleteAsync(request, cancellationToken);
return result.Content;
}
}
```
### Provider Selection
```csharp
// Get specific provider
var openaiProvider = _providerFactory.GetProvider("openai");
var claudeProvider = _providerFactory.GetProvider("claude");
var llamaProvider = _providerFactory.GetProvider("llama-server");
// List available providers
var available = _providerFactory.AvailableProviders;
// Returns: ["llama-server", "ollama", "openai", "claude"]
```
### Automatic Fallback
```csharp
// Create a fallback provider that tries providers in order
var fallbackProvider = new FallbackLlmProvider(
_providerFactory,
providerOrder: ["llama-server", "ollama", "openai", "claude"],
_logger);
// Uses first available provider, falls back on failure
var result = await fallbackProvider.CompleteAsync(request, cancellationToken);
```
### Streaming Responses
```csharp
var provider = _providerFactory.GetDefaultProvider();
await foreach (var chunk in provider.CompleteStreamAsync(request, cancellationToken))
{
Console.Write(chunk.Content);
if (chunk.IsFinal)
{
Console.WriteLine($"\n[Finished: {chunk.FinishReason}]");
}
}
```
## Determinism Requirements
For reproducible AI outputs (required for attestations):
| Setting | Value | Purpose |
|---------|-------|---------|
| `temperature` | `0.0` | No randomness in token selection |
| `seed` | `42` | Fixed random seed |
| `topK` | `1` | Single token selection (optional) |
```yaml
inference:
temperature: 0.0
seed: 42
topK: 1 # Most deterministic
```
**Verification:**
```csharp
var result = await provider.CompleteAsync(request, cancellationToken);
if (!result.Deterministic)
{
_logger.LogWarning("Output may not be reproducible");
}
```
## Offline/Airgap Deployment
### Recommended Configuration
```
etc/llm-providers/
llama-server.yaml # Primary - enabled, priority: 10
ollama.yaml # Backup - enabled, priority: 20
openai.yaml # Disabled or missing
claude.yaml # Disabled or missing
```
### Model Bundle Verification
For airgap environments, use signed model bundles:
```yaml
# llama-server.yaml
bundle:
bundlePath: "/bundles/llama3-8b.stellaops-model"
verifySignature: true
cryptoScheme: "ed25519"
model:
expectedDigest: "sha256:abc123..."
```
**Creating a model bundle:**
```bash
# Create signed bundle
stella model bundle \
--model /models/llama-3-8b-instruct.Q4_K_M.gguf \
--sign \
--output /bundles/llama3-8b.stellaops-model
# Verify bundle
stella model verify /bundles/llama3-8b.stellaops-model
```
## Custom Plugins
To add support for a new LLM provider:
```csharp
public sealed class CustomLlmProviderPlugin : ILlmProviderPlugin
{
public string Name => "Custom LLM Provider";
public string ProviderId => "custom";
public string DisplayName => "Custom LLM";
public string Description => "Custom LLM backend";
public string DefaultConfigFileName => "custom.yaml";
public bool IsAvailable(IServiceProvider services) => true;
public ILlmProvider Create(IServiceProvider services, IConfiguration configuration)
{
var config = CustomConfig.FromConfiguration(configuration);
var httpClientFactory = services.GetRequiredService<IHttpClientFactory>();
var logger = services.GetRequiredService<ILogger<CustomLlmProvider>>();
return new CustomLlmProvider(httpClientFactory.CreateClient(), config, logger);
}
public LlmProviderConfigValidation ValidateConfiguration(IConfiguration configuration)
{
// Validate configuration
return LlmProviderConfigValidation.Success();
}
}
```
Register the custom plugin:
```csharp
services.AddLlmProviderPlugins(catalog =>
{
catalog.RegisterPlugin(new CustomLlmProviderPlugin());
catalog.LoadConfigurationsFromDirectory("etc/llm-providers");
});
```
## Telemetry
LLM operations emit structured logs:
```json
{
"timestamp": "2025-12-26T10:30:00Z",
"operation": "llm_completion",
"providerId": "llama-server",
"model": "llama3-8b-q4km",
"inputTokens": 1234,
"outputTokens": 567,
"totalTimeMs": 2345,
"deterministic": true,
"finishReason": "stop"
}
```
## Performance Comparison
| Provider | Latency (TTFT) | Throughput | Cost | Offline |
|----------|---------------|------------|------|---------|
| **llama-server** | 50-200ms | 20-50 tok/s | Free | Yes |
| **ollama** | 100-500ms | 15-40 tok/s | Free | Yes |
| **openai (gpt-4o)** | 200-500ms | 50-100 tok/s | $$$ | No |
| **claude (sonnet)** | 300-600ms | 40-80 tok/s | $$$ | No |
*Note: Local performance depends heavily on hardware (GPU, RAM, CPU).*
## Troubleshooting
### Provider Not Available
```
InvalidOperationException: No LLM providers are available.
```
**Solutions:**
1. Check configuration files exist in `etc/llm-providers/`
2. Verify API keys are set (environment variables or config)
3. For local providers, ensure server is running:
```bash
# llama-server
curl http://localhost:8080/health
# ollama
curl http://localhost:11434/api/tags
```
### Non-Deterministic Output
```
Warning: Output may not be reproducible
```
**Solutions:**
1. Set `temperature: 0.0` in configuration
2. Set `seed: 42` (or any fixed value)
3. Use the same model version across environments
### Timeout Errors
```
TaskCanceledException: The request was canceled due to timeout.
```
**Solutions:**
1. Increase `request.timeout` in configuration
2. For local inference, ensure sufficient hardware resources
3. Reduce `maxTokens` if appropriate
## Related Documentation
- [AI Attestations](./ai-attestations.md)
- [Offline Model Bundles](./offline-model-bundles.md)
- [Advisory AI Architecture](../architecture.md)
- [Configuration Reference](../../../../etc/llm-providers/)

View File

@@ -0,0 +1,278 @@
# Offline AI Model Bundles
> **Sprint:** SPRINT_20251226_019_AI_offline_inference
> **Task:** OFFLINE-23, OFFLINE-26
This guide covers transferring and configuring AI model bundles for air-gapped deployments.
## Overview
Local LLM inference in air-gapped environments requires model weight bundles to be transferred via sneakernet (USB, portable media, or internal package servers). The AdvisoryAI module supports deterministic local inference with signed model bundles.
## Model Bundle Format
```
/offline/models/<model-id>/
├── manifest.json # Bundle metadata + file digests
├── signature.dsse # DSSE envelope with model signature
├── weights/
│ ├── model.gguf # Quantized weights (llama.cpp format)
│ └── model.gguf.sha256 # SHA-256 digest
├── tokenizer/
│ ├── tokenizer.json # Tokenizer config
│ └── special_tokens.json # Special tokens map
└── config/
├── model_config.json # Model architecture config
└── inference.json # Recommended inference settings
```
## Manifest Schema
```json
{
"bundle_id": "llama3-8b-q4km-v1",
"model_family": "llama3",
"model_size": "8B",
"quantization": "Q4_K_M",
"license": "Apache-2.0",
"created_at": "2025-12-26T00:00:00Z",
"files": [
{
"path": "weights/model.gguf",
"digest": "sha256:a1b2c3d4e5f6...",
"size": 4893456789
},
{
"path": "tokenizer/tokenizer.json",
"digest": "sha256:1a2b3c4d5e6f...",
"size": 1842
}
],
"crypto_scheme": "ed25519",
"signature_id": "ed25519-20251226-a1b2c3d4"
}
```
## Transfer Workflow
### 1. Export on Connected Machine
```bash
# Pull model from registry and create signed bundle
stella model pull llama3-8b-q4km --offline --output /mnt/usb/models/
# Verify bundle before transfer
stella model verify /mnt/usb/models/llama3-8b-q4km/ --verbose
```
### 2. Transfer Verification
Before physically transferring the media, verify the bundle integrity:
```bash
# Generate transfer manifest with all digests
stella model export-manifest /mnt/usb/models/ --output transfer-manifest.json
# Print weights digest for phone/radio verification
sha256sum /mnt/usb/models/llama3-8b-q4km/weights/model.gguf
# Example output: a1b2c3d4... model.gguf
# Cross-check against manifest
jq '.files[] | select(.path | contains("model.gguf")) | .digest' manifest.json
```
### 3. Import on Air-Gapped Host
```bash
# Import with signature verification
stella model import /mnt/usb/models/llama3-8b-q4km/ \
--verify-signature \
--destination /var/lib/stellaops/models/
# Verify loaded model matches expected digest
stella model info llama3-8b-q4km --verify
# List all installed models
stella model list
```
## CLI Model Commands
| Command | Description |
|---------|-------------|
| `stella model list` | List installed model bundles |
| `stella model pull --offline` | Download bundle to local path for transfer |
| `stella model verify <path>` | Verify bundle integrity and signature |
| `stella model import <path>` | Import bundle from external media |
| `stella model info <model-id>` | Display bundle details and verification status |
| `stella model remove <model-id>` | Remove installed model bundle |
### Command Examples
```bash
# List models with details
stella model list --verbose
# Pull specific model variant
stella model pull llama3-8b --quantization Q4_K_M --offline --output ./bundle/
# Verify all installed bundles
stella model verify --all
# Get model info including signature status
stella model info llama3-8b-q4km --show-signature
# Remove model bundle
stella model remove llama3-8b-q4km --force
```
## Configuration
### Local Inference Configuration
Configure in `etc/advisory-ai.yaml`:
```yaml
advisoryAi:
inference:
mode: Local # Local | Remote
local:
bundlePath: /var/lib/stellaops/models/llama3-8b-q4km
requiredDigest: "sha256:a1b2c3d4e5f6..."
verifySignature: true
deviceType: CPU # CPU | GPU | NPU
# Determinism settings (required for replay)
contextLength: 4096
temperature: 0.0
seed: 42
# Performance tuning
threads: 4
batchSize: 512
gpuLayers: 0 # 0 = CPU only
```
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `ADVISORYAI_INFERENCE_MODE` | `Local` or `Remote` | `Local` |
| `ADVISORYAI_MODEL_PATH` | Path to model bundle | `/var/lib/stellaops/models` |
| `ADVISORYAI_MODEL_VERIFY` | Verify signature on load | `true` |
| `ADVISORYAI_INFERENCE_THREADS` | CPU threads for inference | `4` |
## Hardware Requirements
| Model Size | Quantization | RAM Required | GPU VRAM | Inference Speed |
|------------|--------------|--------------|----------|-----------------|
| 7-8B | Q4_K_M | 8 GB | N/A (CPU) | ~10 tokens/sec |
| 7-8B | FP16 | 16 GB | 8 GB | ~50 tokens/sec |
| 13B | Q4_K_M | 16 GB | N/A (CPU) | ~5 tokens/sec |
| 13B | FP16 | 32 GB | 16 GB | ~30 tokens/sec |
### Recommended Configurations
**Minimal (CPU-only, 8GB RAM):**
- Model: Llama 3 8B Q4_K_M
- Settings: `threads: 4`, `batchSize: 256`
- Expected: ~10 tokens/sec
**Standard (CPU, 16GB RAM):**
- Model: Llama 3 8B Q4_K_M or 13B Q4_K_M
- Settings: `threads: 8`, `batchSize: 512`
- Expected: ~15-20 tokens/sec (8B), ~5-8 tokens/sec (13B)
**GPU-Accelerated (8GB VRAM):**
- Model: Llama 3 8B FP16
- Settings: `gpuLayers: 35`, `batchSize: 512`
- Expected: ~50 tokens/sec
## Signing and Verification
### Model Bundle Signing
Bundles are signed using DSSE (Dead Simple Signing Envelope) format:
```json
{
"payloadType": "application/vnd.stellaops.model-bundle+json",
"payload": "<base64-encoded-manifest-digest>",
"signatures": [
{
"keyId": "stellaops-model-signer-2025",
"sig": "<base64-signature>"
}
]
}
```
### Regional Crypto Support
| Region | Algorithm | Key Type |
|--------|-----------|----------|
| Default | Ed25519 | Ed25519 |
| FIPS (US) | ECDSA-P256 | NIST P-256 |
| GOST (RU) | GOST 34.10-2012 | GOST R 34.10-2012 |
| SM (CN) | SM2 | SM2 |
### Verification at Load Time
When a model is loaded, the following checks occur:
1. **Signature verification**: DSSE envelope is verified against known keys
2. **Manifest integrity**: All file digests are recalculated and compared
3. **Bundle completeness**: All required files are present
4. **Configuration validation**: Inference settings are within safe bounds
## Deterministic Inference
For reproducible AI outputs (required for attestation replay):
```yaml
advisoryAi:
inference:
local:
# CRITICAL: These settings ensure deterministic output
temperature: 0.0
seed: 42
topK: 1
topP: 1.0
```
With these settings, the same prompt will produce identical output across runs, enabling:
- AI artifact replay for compliance audits
- Divergence detection between environments
- Attestation verification
## Benchmarking
Run local inference benchmarks:
```bash
# Run standard benchmark suite
stella model benchmark llama3-8b-q4km --iterations 10
# Output includes:
# - Latency: mean, median, p95, p99, TTFT
# - Throughput: tokens/sec, requests/min
# - Resource usage: peak memory, CPU utilization
```
## Troubleshooting
| Symptom | Cause | Resolution |
|---------|-------|------------|
| `signature verification failed` | Bundle tampered or wrong key | Re-download bundle, verify chain of custody |
| `digest mismatch` | Corrupted during transfer | Re-copy from source, verify SHA-256 |
| `model not found` | Wrong bundle path | Check `bundlePath` in config |
| `out of memory` | Model too large | Use smaller quantization (Q4_K_M) |
| `inference timeout` | CPU too slow | Increase timeout or enable GPU |
| `non-deterministic output` | Wrong settings | Set `temperature: 0`, `seed: 42` |
## Related Documentation
- [Advisory AI Architecture](../architecture.md)
- [Offline Kit Overview](../../../24_OFFLINE_KIT.md)
- [AI Attestations](../../../implplan/SPRINT_20251226_018_AI_attestations.md)
- [Replay Semantics](./replay-semantics.md)

View File

@@ -0,0 +1,605 @@
# Policy Studio API and Rule Syntax
> **Sprint:** SPRINT_20251226_017_AI_policy_copilot
> **Task:** POLICY-26
This guide documents the Policy Studio API for AI-powered policy authoring, converting natural language to lattice rules.
## Overview
Policy Studio enables:
1. **Natural Language → Policy Intent**: Parse human intent from plain English
2. **Intent → Lattice Rules**: Generate K4 lattice-compatible rules
3. **Validation**: Detect conflicts, unreachable conditions, loops
4. **Test Synthesis**: Auto-generate test cases for policy validation
5. **Compilation**: Bundle rules into signed, versioned policy packages
## API Endpoints
### Parse Natural Language
Convert natural language to structured policy intent.
```http
POST /api/v1/policy/studio/parse
Content-Type: application/json
{
"input": "Block all critical vulnerabilities in production services unless they have a vendor VEX stating not affected",
"scope": "production"
}
```
**Response:**
```json
{
"intent": {
"intentId": "intent-20251226-001",
"intentType": "OverrideRule",
"originalInput": "Block all critical vulnerabilities in production services unless they have a vendor VEX stating not affected",
"conditions": [
{
"field": "severity",
"operator": "equals",
"value": "critical",
"connector": "and"
},
{
"field": "scope",
"operator": "equals",
"value": "production",
"connector": "and"
},
{
"field": "has_vex",
"operator": "equals",
"value": false,
"connector": null
}
],
"actions": [
{
"actionType": "set_verdict",
"parameters": {
"verdict": "block",
"reason": "Critical vulnerability without VEX exception"
}
}
],
"scope": "production",
"scopeId": null,
"priority": 100,
"confidence": 0.92,
"alternatives": null,
"clarifyingQuestions": null
},
"success": true,
"modelId": "claude-sonnet-4-20250514",
"parsedAt": "2025-12-26T10:30:00Z"
}
```
### Clarifying Questions
When intent is ambiguous, the API returns clarifying questions:
```json
{
"intent": {
"intentId": "intent-20251226-002",
"intentType": "ThresholdRule",
"confidence": 0.65,
"clarifyingQuestions": [
"Should this rule apply to all environments or just production?",
"What should happen when the threshold is exceeded: block or escalate?"
],
"alternatives": [
{ "...alternative interpretation 1..." },
{ "...alternative interpretation 2..." }
]
},
"success": true
}
```
### Generate Rules
Convert policy intent to K4 lattice rules.
```http
POST /api/v1/policy/studio/generate
Content-Type: application/json
{
"intentId": "intent-20251226-001"
}
```
**Response:**
```json
{
"rules": [
{
"ruleId": "rule-20251226-001",
"name": "block-critical-no-vex",
"description": "Block critical vulnerabilities in production without VEX exception",
"latticeExpression": "Present ∧ ¬Mitigated ∧ severity=critical ∧ scope=production → Block",
"conditions": [
{ "field": "severity", "operator": "equals", "value": "critical" },
{ "field": "scope", "operator": "equals", "value": "production" },
{ "field": "has_vex", "operator": "equals", "value": false }
],
"disposition": "Block",
"priority": 100,
"scope": "production",
"enabled": true
}
],
"success": true,
"warnings": [],
"intentId": "intent-20251226-001",
"generatedAt": "2025-12-26T10:30:00Z"
}
```
### Validate Rules
Check rules for conflicts and issues.
```http
POST /api/v1/policy/studio/validate
Content-Type: application/json
{
"rules": [
{ "ruleId": "rule-20251226-001", "..." },
{ "ruleId": "rule-20251226-002", "..." }
],
"existingRuleIds": ["rule-existing-001", "rule-existing-002"]
}
```
**Response:**
```json
{
"valid": false,
"conflicts": [
{
"ruleId1": "rule-20251226-001",
"ruleId2": "rule-existing-002",
"description": "Both rules match critical vulnerabilities but produce different dispositions (Block vs Allow)",
"suggestedResolution": "Add priority ordering or more specific conditions to disambiguate",
"severity": "error"
}
],
"unreachableConditions": [
"Rule rule-20251226-002 condition 'severity=low AND severity=high' is always false"
],
"potentialLoops": [],
"coverage": 0.85
}
```
### Compile Policy Bundle
Bundle validated rules into a signed policy package.
```http
POST /api/v1/policy/studio/compile
Content-Type: application/json
{
"rules": [
{ "ruleId": "rule-20251226-001", "..." }
],
"bundleName": "production-security-policy",
"version": "1.0.0",
"sign": true
}
```
**Response:**
```json
{
"bundleId": "bundle-20251226-001",
"bundleName": "production-security-policy",
"version": "1.0.0",
"ruleCount": 5,
"digest": "sha256:bundledigest...",
"signed": true,
"signatureKeyId": "stellaops-policy-signer-2025",
"compiledAt": "2025-12-26T10:30:00Z",
"downloadUrl": "/api/v1/policy/bundle/bundle-20251226-001"
}
```
## Policy Intent Types
| Type | Description | Example |
|------|-------------|---------|
| `OverrideRule` | Override default verdict | "Block all critical CVEs" |
| `EscalationRule` | Escalate findings | "Escalate CVSS ≥9.0 to security team" |
| `ExceptionCondition` | Bypass rules | "Except internal-only services" |
| `MergePrecedence` | Priority ordering | "VEX takes precedence over CVSS" |
| `ThresholdRule` | Automatic thresholds | "Allow max 10 high-severity per service" |
| `ScopeRestriction` | Scope limits | "Only apply to production" |
## Rule Syntax
### Lattice Expression Format
Rules use K4 lattice logic:
```
<atoms> → <disposition>
```
#### Security Atoms
| Atom | Meaning |
|------|---------|
| `Present` | Vulnerability is present in artifact |
| `Applies` | Vulnerability applies to this context |
| `Reachable` | Vulnerable code is reachable |
| `Mitigated` | Mitigation exists (VEX, WAF, etc.) |
| `Fixed` | Fix is available |
| `Misattributed` | False positive |
#### Operators
| Operator | Symbol | Example |
|----------|--------|---------|
| AND | `∧` | `Present ∧ Reachable` |
| OR | `` | `Fixed Mitigated` |
| NOT | `¬` | `¬Mitigated` |
| Implies | `→` | `Present → Block` |
#### Dispositions
| Disposition | Meaning |
|-------------|---------|
| `Block` | Fail the build/gate |
| `Warn` | Warning only |
| `Allow` | Pass with no action |
| `Review` | Require human review |
| `Escalate` | Escalate to security team |
### Examples
```
# Block critical unmitigated vulnerabilities
Present ∧ Reachable ∧ ¬Mitigated ∧ severity=critical → Block
# Allow if vendor says not affected
Present ∧ Mitigated ∧ vex_status=not_affected → Allow
# Escalate CVSS ≥9.0
Present ∧ cvss_score>=9.0 → Escalate
# Warn on high severity with fix available
Present ∧ severity=high ∧ Fixed → Warn
```
## Condition Fields
| Field | Type | Values |
|-------|------|--------|
| `severity` | string | `critical`, `high`, `medium`, `low`, `none` |
| `cvss_score` | number | 0.0 - 10.0 |
| `reachable` | boolean | `true`, `false` |
| `has_vex` | boolean | `true`, `false` |
| `vex_status` | string | `not_affected`, `affected`, `fixed`, `under_investigation` |
| `has_fix` | boolean | `true`, `false` |
| `fix_version` | string | Version string |
| `scope` | string | `production`, `staging`, `development` |
| `age_days` | number | Days since disclosure |
| `exploit_available` | boolean | `true`, `false` |
| `in_kev` | boolean | In CISA KEV catalog |
## Condition Operators
| Operator | Description | Example |
|----------|-------------|---------|
| `equals` | Exact match | `severity equals critical` |
| `not_equals` | Not equal | `scope not_equals development` |
| `greater_than` | Greater than | `cvss_score greater_than 7.0` |
| `less_than` | Less than | `age_days less_than 30` |
| `greater_or_equal` | ≥ | `cvss_score greater_or_equal 9.0` |
| `less_or_equal` | ≤ | `cvss_score less_or_equal 3.9` |
| `contains` | String contains | `component contains lodash` |
| `in` | In list | `severity in [critical, high]` |
| `not_in` | Not in list | `scope not_in [development, test]` |
## Test Case Format
### Generated Test Cases
Policy Studio auto-generates test cases:
```json
{
"testCases": [
{
"testId": "test-001",
"type": "positive",
"description": "Critical unmitigated vulnerability should be blocked",
"input": {
"severity": "critical",
"reachable": true,
"has_vex": false,
"scope": "production"
},
"expectedDisposition": "Block",
"matchedRuleId": "rule-20251226-001"
},
{
"testId": "test-002",
"type": "negative",
"description": "Critical vulnerability with VEX should not match block rule",
"input": {
"severity": "critical",
"reachable": true,
"has_vex": true,
"vex_status": "not_affected",
"scope": "production"
},
"expectedDisposition": "Allow",
"shouldNotMatch": "rule-20251226-001"
},
{
"testId": "test-003",
"type": "boundary",
"description": "CVSS exactly at threshold",
"input": {
"cvss_score": 9.0,
"severity": "critical"
},
"expectedDisposition": "Escalate"
},
{
"testId": "test-004",
"type": "conflict",
"description": "Input matching multiple conflicting rules",
"input": {
"severity": "high",
"reachable": true,
"has_fix": true
},
"possibleDispositions": ["Warn", "Block"],
"conflictingRules": ["rule-001", "rule-002"]
}
]
}
```
### Test Types
| Type | Purpose | Auto-Generated |
|------|---------|---------------|
| `positive` | Should match rule and produce expected disposition | Yes |
| `negative` | Should NOT match rule (boundary conditions) | Yes |
| `boundary` | Edge cases at thresholds | Yes |
| `conflict` | Triggers multiple rules | Yes |
| `manual` | User-defined custom cases | No |
## Natural Language Examples
### Override Rules
```
Input: "Block all critical vulnerabilities"
→ Present ∧ severity=critical → Block
Input: "Allow vulnerabilities with VEX not_affected status"
→ Present ∧ vex_status=not_affected → Allow
Input: "Block exploitable vulnerabilities older than 30 days"
→ Present ∧ exploit_available=true ∧ age_days>30 → Block
```
### Escalation Rules
```
Input: "Escalate anything in the KEV catalog to security team"
→ Present ∧ in_kev=true → Escalate
Input: "Escalate CVSS 9.0 or above"
→ Present ∧ cvss_score>=9.0 → Escalate
```
### Exception Conditions
```
Input: "Except for development environments"
→ Adds: ∧ scope!=development to existing rules
Input: "Unless there's a VEX from the vendor"
→ Adds: ∧ ¬(has_vex=true ∧ vex_status=not_affected)
```
### Threshold Rules
```
Input: "Allow maximum 5 high-severity vulnerabilities per service"
→ Creates threshold counter with Block when exceeded
```
## CLI Commands
```bash
# Parse natural language
stella policy parse "Block all critical CVEs in production"
# Generate rules from intent
stella policy generate intent-20251226-001
# Validate rules
stella policy validate rules.yaml
# Run test cases
stella policy test rules.yaml --cases tests.yaml
# Compile bundle
stella policy compile rules.yaml \
--name production-policy \
--version 1.0.0 \
--sign
# Apply policy
stella policy apply bundle-20251226-001.tar.gz
```
## Configuration
```yaml
policyStudio:
# Maximum conditions per rule
maxConditionsPerRule: 10
# Auto-generate test cases
autoGenerateTests: true
# Test case types to generate
testTypes:
- positive
- negative
- boundary
- conflict
# Minimum test coverage
minTestCoverage: 0.80
# Require human approval for production policies
requireApproval:
production: true
staging: false
development: false
# Number of approvers required
requiredApprovers: 2
# Sign compiled bundles
signBundles: true
```
## Policy Bundle Format
Compiled policy bundles are tar.gz archives:
```
production-policy-1.0.0.tar.gz
├── manifest.json # Bundle metadata
├── rules/
│ ├── rule-001.yaml # Individual rule files
│ ├── rule-002.yaml
│ └── ...
├── tests/
│ ├── test-001.yaml # Test cases
│ └── ...
├── signature.dsse.json # DSSE signature
└── checksums.sha256 # File checksums
```
### Manifest Schema
```json
{
"bundleId": "bundle-20251226-001",
"bundleName": "production-security-policy",
"version": "1.0.0",
"createdAt": "2025-12-26T10:30:00Z",
"createdBy": "policy-studio",
"rules": [
{
"ruleId": "rule-001",
"name": "block-critical",
"file": "rules/rule-001.yaml"
}
],
"testCount": 15,
"coverage": 0.92,
"signed": true,
"signatureKeyId": "stellaops-policy-signer-2025"
}
```
## Attestation Format
Policy drafts are attested using DSSE:
```json
{
"_type": "https://stellaops.org/attestation/policy-draft/v1",
"bundleId": "bundle-20251226-001",
"bundleName": "production-security-policy",
"version": "1.0.0",
"authority": "Validated",
"rules": {
"count": 5,
"ruleIds": ["rule-001", "rule-002", "..."]
},
"validation": {
"valid": true,
"conflictCount": 0,
"testsPassed": 15,
"coverage": 0.92
},
"model": {
"modelId": "claude-sonnet-4-20250514",
"parseConfidence": 0.95
},
"createdAt": "2025-12-26T10:30:00Z"
}
```
## Error Handling
### Parse Errors
```json
{
"success": false,
"error": "ambiguous_intent",
"message": "Could not determine whether 'block' means verdict or action",
"suggestions": [
"Try: 'Set verdict to block for critical vulnerabilities'",
"Try: 'Fail the build for critical vulnerabilities'"
]
}
```
### Validation Errors
```json
{
"valid": false,
"conflicts": [
{
"severity": "error",
"description": "Rule A and Rule B have contradicting dispositions for the same conditions"
}
]
}
```
### Compilation Errors
```json
{
"success": false,
"error": "compilation_failed",
"message": "Cannot compile bundle with unresolved conflicts",
"unresolvedConflicts": 2
}
```
## Related Documentation
- [Trust Lattice Engine](../../policy/trust-lattice.md)
- [K4 Lattice Reference](../../policy/k4-lattice.md)
- [AI Attestations](./ai-attestations.md)
- [Advisory AI Architecture](../architecture.md)

View File

@@ -0,0 +1,448 @@
# SCM Connector Plugins
> **Sprint:** SPRINT_20251226_016_AI_remedy_autopilot
> **Tasks:** REMEDY-08 through REMEDY-14
This guide documents the SCM (Source Control Management) connector plugin architecture for automated remediation PR generation.
## Overview
StellaOps supports automated Pull Request generation for remediation plans across multiple SCM platforms. The plugin architecture enables customer-premise integrations with:
- **GitHub** (github.com and GitHub Enterprise Server)
- **GitLab** (gitlab.com and self-hosted)
- **Azure DevOps** (Services and Server)
- **Gitea** (including Forgejo and Codeberg)
## Architecture
### Plugin Interface
```csharp
public interface IScmConnectorPlugin
{
string ScmType { get; } // "github", "gitlab", "azuredevops", "gitea"
string DisplayName { get; } // Human-readable name
bool IsAvailable(ScmConnectorOptions options); // Check if configured
bool CanHandle(string repositoryUrl); // Auto-detect from URL
IScmConnector Create(ScmConnectorOptions options, HttpClient httpClient);
}
```
### Connector Interface
```csharp
public interface IScmConnector
{
string ScmType { get; }
// Branch operations
Task<BranchResult> CreateBranchAsync(
string owner, string repo, string branchName, string baseBranch, ...);
// File operations
Task<FileUpdateResult> UpdateFileAsync(
string owner, string repo, string branch, string filePath,
string content, string commitMessage, ...);
// Pull request operations
Task<PrCreateResult> CreatePullRequestAsync(
string owner, string repo, string headBranch, string baseBranch,
string title, string body, ...);
Task<PrStatusResult> GetPullRequestStatusAsync(...);
Task<bool> UpdatePullRequestAsync(...);
Task<bool> AddCommentAsync(...);
Task<bool> ClosePullRequestAsync(...);
// CI status
Task<CiStatusResult> GetCiStatusAsync(
string owner, string repo, string commitSha, ...);
}
```
### Catalog and Factory
```csharp
public sealed class ScmConnectorCatalog
{
// Get connector by explicit type
IScmConnector? GetConnector(string scmType, ScmConnectorOptions options);
// Auto-detect SCM type from repository URL
IScmConnector? GetConnectorForRepository(string repositoryUrl, ScmConnectorOptions options);
// List all available plugins
IReadOnlyList<IScmConnectorPlugin> Plugins { get; }
}
```
## Configuration
### Sample Configuration
```yaml
scmConnectors:
timeoutSeconds: 30
userAgent: "StellaOps.AdvisoryAI.Remediation/1.0"
github:
enabled: true
baseUrl: "" # Default: https://api.github.com
apiToken: "${GITHUB_PAT}"
gitlab:
enabled: true
baseUrl: "" # Default: https://gitlab.com/api/v4
apiToken: "${GITLAB_PAT}"
azuredevops:
enabled: true
baseUrl: "" # Default: https://dev.azure.com
apiToken: "${AZURE_DEVOPS_PAT}"
gitea:
enabled: true
baseUrl: "https://git.example.com" # Required
apiToken: "${GITEA_TOKEN}"
```
### Environment Variables
| Variable | Description |
|----------|-------------|
| `STELLAOPS_SCM_GITHUB_TOKEN` | GitHub PAT or App token |
| `STELLAOPS_SCM_GITLAB_TOKEN` | GitLab Personal/Project token |
| `STELLAOPS_SCM_AZUREDEVOPS_TOKEN` | Azure DevOps PAT |
| `STELLAOPS_SCM_GITEA_TOKEN` | Gitea application token |
### Required Token Scopes
| Platform | Required Scopes |
|----------|-----------------|
| **GitHub** | `repo`, `workflow` (PAT) or `contents:write`, `pull_requests:write`, `checks:read` (App) |
| **GitLab** | `api`, `read_repository`, `write_repository` |
| **Azure DevOps** | Code (Read & Write), Pull Request Contribute, Build (Read) |
| **Gitea** | `repo` (full repository access) |
## Connector Details
### GitHub Connector
```yaml
github:
enabled: true
baseUrl: "" # Leave empty for github.com
apiToken: "${GITHUB_PAT}"
```
**Features:**
- Bearer token authentication
- Check-runs API for CI status (GitHub Actions)
- Combined commit status support
- Enterprise Server support via `baseUrl`
**API Endpoints Used:**
- `GET /repos/{owner}/{repo}/git/refs/heads/{branch}` - Get branch SHA
- `POST /repos/{owner}/{repo}/git/refs` - Create branch
- `PUT /repos/{owner}/{repo}/contents/{path}` - Update file
- `POST /repos/{owner}/{repo}/pulls` - Create PR
- `GET /repos/{owner}/{repo}/commits/{sha}/check-runs` - CI status
### GitLab Connector
```yaml
gitlab:
enabled: true
baseUrl: "" # Leave empty for gitlab.com
apiToken: "${GITLAB_PAT}"
```
**Features:**
- PRIVATE-TOKEN header authentication
- Merge Request creation (GitLab terminology)
- Pipeline and Jobs API for CI status
- Self-hosted instance support
**API Endpoints Used:**
- `POST /projects/{id}/repository/branches` - Create branch
- `POST /projects/{id}/repository/commits` - Commit file changes
- `POST /projects/{id}/merge_requests` - Create MR
- `GET /projects/{id}/pipelines?sha={sha}` - CI status
- `GET /projects/{id}/pipelines/{id}/jobs` - Job details
### Azure DevOps Connector
```yaml
azuredevops:
enabled: true
baseUrl: "" # Leave empty for Azure DevOps Services
apiToken: "${AZURE_DEVOPS_PAT}"
apiVersion: "7.1"
```
**Features:**
- Basic authentication with PAT (empty username, token as password)
- Push API for atomic commits
- Azure Pipelines build status
- Azure DevOps Server support
**API Endpoints Used:**
- `GET /{org}/{project}/_apis/git/refs` - Get branch refs
- `POST /{org}/{project}/_apis/git/refs` - Create branch
- `POST /{org}/{project}/_apis/git/pushes` - Commit changes
- `POST /{org}/{project}/_apis/git/pullrequests` - Create PR
- `GET /{org}/{project}/_apis/build/builds` - Build status
### Gitea Connector
```yaml
gitea:
enabled: true
baseUrl: "https://git.example.com" # Required
apiToken: "${GITEA_TOKEN}"
```
**Features:**
- Token header authentication
- Gitea Actions support (workflow runs)
- Compatible with Forgejo and Codeberg
- Combined commit status API
**API Endpoints Used:**
- `GET /api/v1/repos/{owner}/{repo}/branches/{branch}` - Get branch
- `POST /api/v1/repos/{owner}/{repo}/branches` - Create branch
- `PUT /api/v1/repos/{owner}/{repo}/contents/{path}` - Update file
- `POST /api/v1/repos/{owner}/{repo}/pulls` - Create PR
- `GET /api/v1/repos/{owner}/{repo}/commits/{sha}/status` - Status
- `GET /api/v1/repos/{owner}/{repo}/actions/runs` - Workflow runs
## Usage
### Dependency Injection
```csharp
// In Startup.cs or Program.cs
services.AddScmConnectors(config =>
{
// Optionally add custom plugins
config.AddPlugin(new CustomScmConnectorPlugin());
// Or remove built-in plugins
config.RemovePlugin("github");
});
```
### Creating a Connector
```csharp
public class RemediationService
{
private readonly ScmConnectorCatalog _catalog;
public async Task<PrCreateResult> CreateRemediationPrAsync(
string repositoryUrl,
RemediationPlan plan,
CancellationToken cancellationToken)
{
var options = new ScmConnectorOptions
{
ApiToken = _configuration["ScmToken"],
BaseUrl = _configuration["ScmBaseUrl"]
};
// Auto-detect connector from URL
var connector = _catalog.GetConnectorForRepository(repositoryUrl, options);
if (connector is null)
throw new InvalidOperationException($"No connector available for {repositoryUrl}");
// Create branch
var branchResult = await connector.CreateBranchAsync(
owner: "myorg",
repo: "myrepo",
branchName: $"stellaops/remediation/{plan.Id}",
baseBranch: "main",
cancellationToken);
// Update files
foreach (var change in plan.FileChanges)
{
await connector.UpdateFileAsync(
owner: "myorg",
repo: "myrepo",
branch: branchResult.BranchName,
filePath: change.Path,
content: change.NewContent,
commitMessage: $"chore: apply remediation for {plan.FindingId}",
cancellationToken);
}
// Create PR
return await connector.CreatePullRequestAsync(
owner: "myorg",
repo: "myrepo",
headBranch: branchResult.BranchName,
baseBranch: "main",
title: $"[StellaOps] Remediation for {plan.FindingId}",
body: GeneratePrBody(plan),
cancellationToken);
}
}
```
### Polling CI Status
```csharp
public async Task<CiState> WaitForCiAsync(
IScmConnector connector,
string owner,
string repo,
string commitSha,
TimeSpan timeout,
CancellationToken cancellationToken)
{
var deadline = DateTime.UtcNow + timeout;
while (DateTime.UtcNow < deadline)
{
var status = await connector.GetCiStatusAsync(
owner, repo, commitSha, cancellationToken);
switch (status.OverallState)
{
case CiState.Success:
case CiState.Failure:
case CiState.Error:
return status.OverallState;
case CiState.Pending:
case CiState.Running:
await Task.Delay(TimeSpan.FromSeconds(30), cancellationToken);
break;
}
}
return CiState.Unknown;
}
```
## CI State Mapping
Different SCM platforms use different status values. The connector normalizes them:
| Platform | Pending | Running | Success | Failure | Error |
|----------|---------|---------|---------|---------|-------|
| **GitHub** | `pending`, `queued` | `in_progress` | `success` | `failure` | `error`, `cancelled` |
| **GitLab** | `pending`, `waiting` | `running` | `success` | `failed` | `canceled`, `skipped` |
| **Azure DevOps** | `notStarted`, `postponed` | `inProgress` | `succeeded` | `failed` | `canceled` |
| **Gitea** | `pending`, `queued` | `running` | `success` | `failure` | `cancelled`, `timed_out` |
## URL Auto-Detection
The `CanHandle` method on each plugin detects repository URLs:
| Plugin | URL Patterns |
|--------|--------------|
| **GitHub** | `github.com`, `github.` |
| **GitLab** | `gitlab.com`, `gitlab.` |
| **Azure DevOps** | `dev.azure.com`, `visualstudio.com`, `azure.com` |
| **Gitea** | `gitea.`, `forgejo.`, `codeberg.org` |
Example:
```csharp
// Auto-detects GitHub
var connector = catalog.GetConnectorForRepository(
"https://github.com/myorg/myrepo", options);
// Auto-detects GitLab
var connector = catalog.GetConnectorForRepository(
"https://gitlab.com/mygroup/myproject", options);
```
## Custom Plugins
To add support for a new SCM platform:
```csharp
public sealed class BitbucketScmConnectorPlugin : IScmConnectorPlugin
{
public string ScmType => "bitbucket";
public string DisplayName => "Bitbucket";
public bool IsAvailable(ScmConnectorOptions options) =>
!string.IsNullOrEmpty(options.ApiToken);
public bool CanHandle(string repositoryUrl) =>
repositoryUrl.Contains("bitbucket.org", StringComparison.OrdinalIgnoreCase);
public IScmConnector Create(ScmConnectorOptions options, HttpClient httpClient) =>
new BitbucketScmConnector(httpClient, options);
}
public sealed class BitbucketScmConnector : ScmConnectorBase
{
// Implement abstract methods...
}
```
Register the custom plugin:
```csharp
services.AddScmConnectors(config =>
{
config.AddPlugin(new BitbucketScmConnectorPlugin());
});
```
## Error Handling
All connector methods return result objects with `Success` and `ErrorMessage`:
```csharp
var result = await connector.CreateBranchAsync(...);
if (!result.Success)
{
_logger.LogError("Failed to create branch: {Error}", result.ErrorMessage);
return;
}
// Continue with successful result
var branchSha = result.CommitSha;
```
## Security Considerations
1. **Token Storage**: Never store tokens in configuration files. Use environment variables or secret management.
2. **Minimum Permissions**: Request only required scopes for each platform.
3. **TLS Verification**: Always verify TLS certificates in production (`verifySsl: true`).
4. **Audit Logging**: All SCM operations are logged for compliance.
5. **Repository Access**: Connectors only access repositories explicitly provided. No enumeration of accessible repos.
## Telemetry
SCM operations emit structured logs:
```json
{
"timestamp": "2025-12-26T10:30:00Z",
"operation": "scm_create_pr",
"scmType": "github",
"owner": "myorg",
"repo": "myrepo",
"branch": "stellaops/remediation/plan-123",
"duration_ms": 1234,
"success": true,
"pr_number": 456,
"pr_url": "https://github.com/myorg/myrepo/pull/456"
}
```
## Related Documentation
- [Remediation API](../remediation-api.md)
- [AI Attestations](./ai-attestations.md)
- [Offline Model Bundles](./offline-model-bundles.md)
- [Configuration Reference](../../../../etc/scm-connectors.yaml.sample)

File diff suppressed because it is too large Load Diff

View File

@@ -316,10 +316,45 @@ Semantic data flows into:
See `docs/modules/scanner/operations/entrypoint-semantic.md` for full schema reference.
**E) Attestation & SBOM bind (optional)**
**E) Binary Vulnerability Lookup (Sprint 20251226_014_BINIDX)**
The **BinaryLookupStageExecutor** enriches scan results with binary-level vulnerability evidence:
* **Identity Extraction**: For each ELF/PE/Mach-O binary, extract Build-ID, file SHA256, and architecture. Generate a `binary_key` for catalog lookups.
* **Build-ID Catalog Lookup**: Query the BinaryIndex known-build catalog using Build-ID as primary key. Returns CVE matches with high confidence (>=0.95) when the exact binary version is indexed.
* **Fingerprint Matching**: For binaries not in the catalog, compute position-independent fingerprints (basic-block, CFG, string-refs) and match against the vulnerability corpus. Returns similarity scores and confidence.
* **Fix Status Detection**: For each CVE match, query distro-specific backport information to determine if the vulnerability was fixed via distro patch. Methods: `changelog`, `patch_analysis`, `advisory`.
* **Valkey Cache**: All lookups are cached with configurable TTL (default 1 hour for identities, 30 minutes for fingerprints). Target cache hit rate: >80% for repeat scans.
**BinaryFindingMapper** converts matches to standard findings format with `BinaryFindingEvidence`:
```csharp
public sealed record BinaryFindingEvidence
{
public required string BinaryKey { get; init; }
public string? BuildId { get; init; }
public required string MatchMethod { get; init; } // buildid_catalog, fingerprint_match, range_match
public required decimal Confidence { get; init; }
public string? FixedVersion { get; init; }
public string? FixStatus { get; init; } // fixed, vulnerable, not_affected, wontfix
}
```
**Proof Segments**: The **Attestor** generates `binary_fingerprint_evidence` proof segments with DSSE signatures for each binary with vulnerability matches. Schema: `https://stellaops.dev/predicates/binary-fingerprint-evidence@v1`.
**UI Badges**: Scan results display status badges:
* **Backported & Safe** (green): Distro backported the fix
* **Affected & Reachable** (red): Vulnerable and in code path
* **Unknown** (gray): Could not determine status
**CLI Commands** (Sprint 20251226_014):
* `stella binary inspect <file>`: Extract identity (Build-ID, hashes, architecture)
* `stella binary lookup <build-id>`: Query vulnerabilities by Build-ID
* `stella binary fingerprint <file>`: Generate position-independent fingerprint
**F) Attestation & SBOM bind (optional)**
* For each **file hash** or **binary hash**, query local cache of **Rekor v2** indices; if an SBOM attestation is found for **exact hash**, bind it to the component (origin=`attested`).
* For the **image** digest, likewise bind SBOM attestations (build‑time referrers).
* For the **image** digest, likewise bind SBOM attestations (build-time referrers).
### 5.4 Component normalization (exact only)

View File

@@ -0,0 +1,280 @@
# Binary Evidence User Guide
> **Sprint:** SPRINT_20251226_014_BINIDX
> **Task:** SCANINT-25
> **Version:** 1.0.0
This guide explains how to use binary vulnerability evidence in StellaOps scans, including CLI commands, understanding scan results, and interpreting backport status.
---
## Overview
Binary Evidence provides vulnerability detection for compiled binaries (ELF, PE, Mach-O) beyond traditional package-based scanning. It identifies vulnerabilities in stripped binaries where package metadata may be missing or inaccurate, and detects when distribution maintainers have backported security fixes.
### Key Features
- **Build-ID Catalog Lookup**: High-confidence matching using GNU Build-IDs
- **Fingerprint Matching**: Position-independent code matching for stripped binaries
- **Backport Detection**: Identifies distribution-patched binaries
- **Cryptographic Evidence**: DSSE-signed proof segments for audit trails
---
## CLI Commands
### Inspect Binary Identity
Extract identity information from a binary file:
```bash
stella binary inspect /path/to/binary
# JSON output
stella binary inspect /path/to/binary --format json
```
**Output:**
```
Binary Identity
Format: ELF
Architecture: x86_64
Build-ID: 8d8f09a0d7e2c1b3a5f4e6d8c0b2a4e6f8d0c2b4
SHA256: sha256:abcd1234567890abcdef1234567890abcdef1234...
Binary Key: openssl:1.1.1w-1
```
### Lookup Vulnerabilities by Build-ID
Query the vulnerability database using a Build-ID:
```bash
stella binary lookup 8d8f09a0d7e2c1b3a5f4e6d8c0b2a4e6f8d0c2b4
# With distribution context
stella binary lookup 8d8f09a0d7e2c1b3a5f4e6d8c0b2a4e6f8d0c2b4 \
--distro debian --release bookworm
# JSON output
stella binary lookup 8d8f09a0d7e2c1b3a5f4e6d8c0b2a4e6f8d0c2b4 --format json
```
**Output:**
```
Vulnerability Matches for Build-ID: 8d8f09a0d7e2c1b3a5f4...
CVE-2023-5678
Status: FIXED (Backported)
Package: pkg:deb/debian/openssl@1.1.1n-0+deb11u4
Method: buildid_catalog
Confidence: 95%
Fixed In: 1.1.1w-1
CVE-2023-4807
Status: FIXED (Backported)
Package: pkg:deb/debian/openssl@1.1.1n-0+deb11u4
Method: buildid_catalog
Confidence: 92%
Fixed In: 1.1.1w-1
```
### Generate Binary Fingerprint
Create a position-independent fingerprint for matching:
```bash
stella binary fingerprint /path/to/binary
# Specific algorithm
stella binary fingerprint /path/to/binary --algorithm cfg
# Fingerprint specific function
stella binary fingerprint /path/to/binary --function SSL_read
# Hex output
stella binary fingerprint /path/to/binary --format hex
```
**Algorithms:**
- `combined` (default): Combines all methods for robust matching
- `basic-block`: Basic block hashes (good for minor changes)
- `cfg`: Control flow graph structure (resilient to reordering)
- `string-refs`: String constant references (fast, less precise)
---
## Understanding Scan Results
### Status Badges
When viewing scan results in the UI or CLI, binaries display status badges:
| Badge | Color | Meaning |
|-------|-------|---------|
| **Backported & Safe** | Green | The distribution backported the security fix. The binary is not vulnerable despite the CVE matching. |
| **Affected & Reachable** | Red | The binary contains vulnerable code and is in an executable code path. |
| **Affected (Low Priority)** | Orange | Vulnerable but not in the main execution path. |
| **Unknown** | Gray | Could not determine vulnerability or fix status. |
### Match Methods
Vulnerability matches use different detection methods with varying confidence:
| Method | Confidence | Description |
|--------|------------|-------------|
| `buildid_catalog` | High (95%+) | Exact Build-ID match in the known-build catalog |
| `fingerprint_match` | Medium (70-90%) | Position-independent code similarity |
| `range_match` | Low (50-70%) | Version range inference |
### Fix Status Detection
Fix status is determined by analyzing:
1. **Changelog**: Parsing distribution changelogs for CVE mentions
2. **Patch Analysis**: Comparing function signatures pre/post patch
3. **Advisory**: Cross-referencing distribution security advisories
---
## Configuration
### Enabling Binary Analysis
In `scanner.yaml`:
```yaml
scanner:
analyzers:
binary:
enabled: true
fingerprintOnMiss: true # Generate fingerprints when catalog miss
binaryIndex:
enabled: true
batchSize: 100
timeoutMs: 5000
minConfidence: 0.7
cache:
enabled: true
identityTtl: 1h
fixStatusTtl: 1h
fingerprintTtl: 30m
```
### Cache Configuration
Binary lookups are cached in Valkey for performance:
```yaml
binaryIndex:
cache:
keyPrefix: "stellaops:binary:"
identityTtl: 1h # Cache Build-ID lookups
fixStatusTtl: 1h # Cache fix status queries
fingerprintTtl: 30m # Shorter TTL for fingerprints
targetHitRate: 0.80 # Target 80% cache hit rate
```
---
## Interpreting Evidence
### Binary Fingerprint Evidence Proof Segment
Each binary with vulnerability matches generates a `binary_fingerprint_evidence` proof segment:
```json
{
"predicateType": "https://stellaops.dev/predicates/binary-fingerprint-evidence@v1",
"version": "1.0.0",
"binary_identity": {
"format": "elf",
"build_id": "8d8f09a0d7e2c1b3a5f4e6d8c0b2a4e6f8d0c2b4",
"file_sha256": "sha256:abcd1234...",
"architecture": "x86_64",
"binary_key": "openssl:1.1.1w-1",
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.1.1"
},
"layer_digest": "sha256:layer1abc123...",
"matches": [
{
"cve_id": "CVE-2023-5678",
"method": "buildid_catalog",
"confidence": 0.95,
"vulnerable_purl": "pkg:deb/debian/openssl@1.1.1n-0+deb11u4",
"fix_status": {
"state": "fixed",
"fixed_version": "1.1.1w-1",
"method": "changelog",
"confidence": 0.98
}
}
]
}
```
### Viewing Proof Chain
In the UI, click "View Proof Chain" on any CVE match to see:
1. The binary identity used for lookup
2. The match method and confidence
3. The fix status determination method
4. The DSSE signature and Rekor log entry (if enabled)
---
## Troubleshooting
### No Matches Found
If binaries show no vulnerability matches:
1. **Check Build-ID**: Run `stella binary inspect` to verify the binary has a Build-ID
2. **Verify Catalog Coverage**: Not all binaries are in the known-build catalog
3. **Enable Fingerprinting**: Set `fingerprintOnMiss: true` to fall back to fingerprint matching
### Low Confidence Matches
Matches below the `minConfidence` threshold (default 0.7) are not reported. To see all matches:
```bash
stella binary lookup <build-id> --min-confidence 0.5
```
### Cache Issues
Clear the binary cache if results seem stale:
```bash
# Via CLI
stella cache clear --prefix binary
# Via Redis CLI
redis-cli KEYS "stellaops:binary:*" | xargs redis-cli DEL
```
### Build-ID Missing
Stripped binaries may lack Build-IDs. Options:
1. Rebuild with `-Wl,--build-id=sha1`
2. Use fingerprint matching instead
3. Map to package using file path heuristics
---
## Best Practices
1. **Include Build-IDs**: Ensure your build pipeline preserves GNU Build-IDs
2. **Use Distro Context**: Always specify `--distro` and `--release` for accurate backport detection
3. **Review Unknown Status**: Investigate binaries with "Unknown" status manually
4. **Monitor Cache Hit Rate**: Target >80% for repeat scans
---
## Related Documentation
- [BinaryIndex Architecture](../../binaryindex/architecture.md)
- [Scanner Architecture](../architecture.md)
- [Proof Chain Specification](../../attestor/proof-chain-specification.md)
- [CLI Reference](../../../09_API_CLI_REFERENCE.md)