Files
git.stella-ops.org/docs/modules/advisory-ai/architecture.md
master ba4c935182 feat: Enhance Authority Identity Provider Registry with Bootstrap Capability
- Added support for bootstrap providers in AuthorityIdentityProviderRegistry.
- Introduced a new property for bootstrap providers and updated AggregateCapabilities.
- Updated relevant methods to handle bootstrap capabilities during provider registration.

feat: Introduce Sealed Mode Status in OpenIddict Handlers

- Added SealedModeStatusProperty to AuthorityOpenIddictConstants.
- Enhanced ValidateClientCredentialsHandler, ValidatePasswordGrantHandler, and ValidateRefreshTokenGrantHandler to validate sealed mode evidence.
- Implemented logic to handle airgap seal confirmation requirements.

feat: Update Program Configuration for Sealed Mode

- Registered IAuthoritySealedModeEvidenceValidator in Program.cs.
- Added logging for bootstrap capabilities in identity provider plugins.
- Implemented checks for bootstrap support in API endpoints.

chore: Update Tasks and Documentation

- Marked AUTH-MTLS-11-002 as DONE in TASKS.md.
- Updated documentation to reflect changes in sealed mode and bootstrap capabilities.

fix: Improve CLI Command Handlers Output

- Enhanced output formatting for command responses and prompts in CommandHandlers.cs.

feat: Extend Advisory AI Models

- Added Response property to AdvisoryPipelineOutputModel for better output handling.

fix: Adjust Concelier Web Service Authentication

- Improved JWT token handling in Concelier Web Service to ensure proper token extraction and logging.

test: Enhance Web Service Endpoints Tests

- Added detailed logging for authentication failures in WebServiceEndpointsTests.
- Enabled PII logging for better debugging of authentication issues.

feat: Introduce Air-Gap Configuration Options

- Added AuthorityAirGapOptions and AuthoritySealedModeOptions to StellaOpsAuthorityOptions.
- Implemented validation logic for air-gap configurations to ensure proper setup.
2025-11-09 12:18:14 +02:00

155 lines
11 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Advisory AI architecture
> Captures the retrieval, guardrail, and inference packaging requirements defined in the Advisory AI implementation plan and related module guides.
## 1) Goals
- Summarise advisories/VEX evidence into operator-ready briefs with citations.
- Explain conflicting statements with provenance and trust weights (using VEX Lens & Excititor data).
- Suggest remediation plans aligned with Offline Kit deployment models and scheduler follow-ups.
- Operate deterministically where possible; cache generated artefacts with digests for audit.
## 2) Pipeline overview
```
+---------------------+
Concelier/VEX Lens | Evidence Retriever |
Policy Engine ----> | (vector + keyword) | ---> Context Pack (JSON)
Zastava runtime +---------------------+
|
v
+-------------+
| Prompt |
| Assembler |
+-------------+
|
v
+-------------+
| Guarded LLM |
| (local/host)|
+-------------+
|
v
+-----------------+
| Citation & |
| Validation |
+-----------------+
|
v
+----------------+
| Output cache |
| (hash, bundle) |
+----------------+
```
## 3) Retrieval & context
- Hybrid search: vector embeddings (SBERT-compatible) + keyword filters for advisory IDs, PURLs, CVEs.
- Context packs include:
- Advisory raw excerpts with highlighted sections and source URLs.
- VEX statements (normalized tuples + trust metadata).
- Policy explain traces for the affected finding.
- Runtime/impact hints from Zastava (exposure, entrypoints).
- Export-ready remediation data (fixed versions, patches).
- **SBOM context retriever** (AIAI-31-002) hydrates:
- Version timelines (first/last observed, status, fix availability).
- Dependency paths (runtime vs build/test, deduped by coordinate chain).
- Tenant environment flags (prod/stage toggles) with optional blast radius summary.
- Service-side clamps: max 500 timeline entries, 200 dependency paths, with client-provided toggles for env/blast data.
- `AddSbomContextHttpClient(...)` registers the typed HTTP client that calls `/v1/sbom/context`, while `NullSbomContextClient` remains the safe default for environments that have not yet exposed the SBOM service.
**Sample configuration** (wire real SBOM base URL + API key):
```csharp
services.AddSbomContextHttpClient(options =>
{
options.BaseAddress = new Uri("https://sbom-service.internal");
options.Endpoint = "/v1/sbom/context";
options.ApiKey = configuration["SBOM_SERVICE_API_KEY"];
options.UserAgent = "stellaops-advisoryai/1.0";
options.Tenant = configuration["TENANT_ID"];
});
services.AddAdvisoryPipeline();
```
After configuration, issue a smoke request (e.g., `ISbomContextRetriever.RetrieveAsync`) during deployment validation to confirm end-to-end connectivity and credentials before enabling Advisory AI endpoints.
Retriever requests and results are trimmed/normalized before hashing; metadata (counts, provenance keys) is returned for downstream guardrails. Unit coverage ensures deterministic ordering and flag handling.
All context references include `content_hash` and `source_id` enabling verifiable citations.
## 4) Guardrails
- Prompt templates enforce structure: summary, conflicts, remediation, references.
- Response validator ensures:
- No hallucinated advisories (every fact must map to input context).
- Citations follow `[n]` indexing referencing actual sources.
- Remediation suggestions only cite policy-approved sources (fixed versions, vendor hotfixes).
- Moderation/PII filters prevent leaking secrets; responses failing validation are rejected and logged.
- Pre-flight guardrails redact secrets (AWS keys, generic API tokens, PEM blobs), block "ignore previous instructions"-style prompt injection attempts, enforce citation presence, and cap prompt payload length (default 16kB). Guardrail outcomes and redaction counts surface via `advisory_guardrail_blocks` / `advisory_outputs_stored` metrics.
## 5) Deterministic tooling
- **Version comparators** — offline semantic version + RPM EVR parsers with range evaluators. Supports chained constraints (`>=`, `<=`, `!=`) used by remediation advice and blast radius calcs.
- Registered via `AddAdvisoryDeterministicToolset` for reuse across orchestrator, CLI, and services.
- **Orchestration pipeline** — see `orchestration-pipeline.md` for prerequisites, task breakdown, and cross-guild responsibilities before wiring the execution flows.
- **Planned extensions** — NEVRA/EVR comparators, ecosystem-specific normalisers, dependency chain scorers (AIAI-31-003 scope).
- Exposed via internal interfaces to allow orchestrator/toolchain reuse; all helpers stay side-effect free and deterministic for golden testing.
## 6) Output persistence
- Cached artefacts stored in `advisory_ai_outputs` with fields:
- `output_hash` (sha256 of JSON response).
- `input_digest` (hash of context pack).
- `summary`, `conflicts`, `remediation`, `citations`.
- `generated_at`, `model_id`, `profile` (Sovereign/FIPS etc.).
- `signatures` (optional DSSE if run in deterministic mode).
- Offline bundle format contains `summary.md`, `citations.json`, `context_manifest.json`, `signatures/`.
## 7) Profiles & sovereignty
- **Profiles:** `default`, `fips-local` (FIPS-compliant local model), `gost-local`, `cloud-openai` (optional, disabled by default). Each profile defines allowed models, key management, and telemetry endpoints.
- **CryptoProfile/RootPack integration:** generated artefacts can be signed using configured CryptoProfile to satisfy procurement/trust requirements.
## 8) APIs
- `POST /api/v1/advisory/{task}` — executes Summary/Conflict/Remediation pipeline (`task` ∈ `summary|conflict|remediation`). Requests accept `{advisoryKey, artifactId?, policyVersion?, profile, preferredSections?, forceRefresh}` and return sanitized prompt payloads, citations, guardrail metadata, provenance hash, and cache hints.
- `GET /api/v1/advisory/outputs/{cacheKey}?taskType=SUMMARY&profile=default` — retrieves cached artefacts for downstream consumers (Console, CLI, Export Center). Guardrail state and provenance hash accompany results.
All endpoints accept `profile` parameter (default `fips-local`) and return `output_hash`, `input_digest`, and `citations` for verification.
## 9) Observability
- Metrics: `advisory_ai_requests_total{profile,type}`, `advisory_ai_latency_seconds`, `advisory_ai_validation_failures_total`.
- Logs: include `output_hash`, `input_digest`, `profile`, `model_id`, `tenant`, `artifacts`. Sensitive context is not logged.
- Traces: spans for retrieval, prompt assembly, model inference, validation, cache write.
## 10) Operational controls
- Feature flags per tenant (`ai.summary.enabled`, `ai.remediation.enabled`).
- Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
- Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.
## 11) Hosting surfaces
- **WebService** — exposes `/v1/advisory-ai/pipeline/{task}` to materialise plans and enqueue execution messages.
- **Worker** — background service draining the advisory pipeline queue (file-backed stub) pending integration with shared transport.
- Both hosts register `AddAdvisoryAiCore`, which wires the SBOM context client, deterministic toolset, pipeline orchestrator, and queue metrics.
- SBOM base address + tenant metadata are configured via `AdvisoryAI:SbomBaseAddress` and propagated through `AddSbomContext`.
## 12) QA harness & determinism (Sprint 110 refresh)
- **Injection fixtures:** `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/prompt-injection-fixtures.txt` drives `AdvisoryGuardrailInjectionTests`, ensuring blocked phrases (`ignore previous instructions`, `override the system prompt`, etc.) are rejected with redaction counters, preventing prompt-injection regressions.
- **Golden prompts:** `summary-prompt.json` now pairs with `conflict-prompt.json`; `AdvisoryPromptAssemblerTests` load both to enforce deterministic JSON payloads across task types and verify vector preview truncation (600 characters + ellipsis) keeps prompts under the documented perf ceiling.
- **Plan determinism:** `AdvisoryPipelineOrchestratorTests` shuffle structured/vector/SBOM inputs and assert cache keys + metadata remain stable, proving that seeded plan caches stay deterministic even when retrievers emit out-of-order results.
- **Execution telemetry:** `AdvisoryPipelineExecutorTests` exercise partial citation coverage (target ≥0.5 when only half the structured chunks are cited) so `advisory_ai_citation_coverage_ratio` reflects real guardrail quality.
- **Plan cache stability:** `AdvisoryPlanCacheTests` now seed the in-memory cache with a fake time provider to confirm TTL refresh when plans are replaced, guaranteeing reproducible eviction under air-gapped runs.
## 13) Deployment profiles, scaling, and remote inference
- **Local inference containers.** `advisory-ai-web` exposes the API/plan cache endpoints while `advisory-ai-worker` drains the queue and executes prompts. Both containers mount the same RWX volume that hosts three deterministic paths: `/var/lib/advisory-ai/queue`, `/var/lib/advisory-ai/plans`, `/var/lib/advisory-ai/outputs`. Compose bundles create named volumes (`advisory-ai-{queue,plans,outputs}`) and the Helm chart mounts the `stellaops-advisory-ai-data` PVC so web + worker remain in lockstep.
- **Remote inference toggle.** Set `AdvisoryAI:Inference:Mode` (env: `ADVISORYAI__AdvisoryAI__Inference__Mode`) to `Remote` when you want prompts to be executed by an external inference tier. Provide `AdvisoryAI:Inference:Remote:BaseAddress` and, optionally, `...:ApiKey`. When remote calls fail the executor falls back to the sanitized prompt and sets `inference.fallback_*` metadata so CLI/Console surface a warning.
- **Scalability.** Start with 1 web replica + 1 worker for up to ~10 requests/minute. For higher throughput, scale `advisory-ai-worker` horizontally; each worker is CPU-bound (2 vCPU / 4GiB RAM recommended) while the web front end is I/O-bound (1 vCPU / 1GiB). Because the queue/plan/output stores are content-addressed files, ensure the shared volume delivers ≥500IOPS and <5ms latency; otherwise queue depth will lag.
- **Offline & air-gapped stance.** The Compose/Helm manifests avoid external network calls by default and the Offline Kit now publishes the `advisory-ai-web` and `advisory-ai-worker` images alongside their SBOMs/provenance. Operators can rehydrate the RWX volume from the kit to pre-prime cache directories before enabling the service.