git.stella-ops.org/docs/modules/advisory-ai/architecture.md

# Advisory AI architecture

> Captures the retrieval, guardrail, and inference packaging requirements defined in the Advisory AI implementation plan and related module guides.

## 1) Goals

- Summarise advisories/VEX evidence into operator-ready briefs with citations.
- Explain conflicting statements with provenance and trust weights (using VEX Lens & Excititor data).
- Suggest remediation plans aligned with Offline Kit deployment models and scheduler follow-ups.
- Operate deterministically where possible; cache generated artefacts with digests for audit.

## 2) Pipeline overview

```
                       +---------------------+
   Concelier/VEX Lens  |  Evidence Retriever |
   Policy Engine ----> |  (vector + keyword) | ---> Context Pack (JSON)
   Zastava runtime     +---------------------+
                               |
                               v
                        +-------------+
                        | Prompt      |
                        | Assembler   |
                        +-------------+
                               |
                               v
                        +-------------+
                        | Guarded LLM |
                        | (local/host)|
                        +-------------+
                               |
                               v
                        +-----------------+
                        | Citation &     |
                        | Validation      |
                        +-----------------+
                               |
                               v
                        +----------------+
                        | Output cache   |
                        | (hash, bundle) |
                        +----------------+
```

## 3) Retrieval & context

- Hybrid search: vector embeddings (SBERT-compatible) + keyword filters for advisory IDs, PURLs, CVEs.
- Context packs include:
  - Advisory raw excerpts with highlighted sections and source URLs.
  - VEX statements (normalized tuples + trust metadata).
  - Policy explain traces for the affected finding.
  - Runtime/impact hints from Zastava (exposure, entrypoints).
  - Export-ready remediation data (fixed versions, patches).
- **SBOM context retriever** (AIAI-31-002) hydrates:
  - Version timelines (first/last observed, status, fix availability).
  - Dependency paths (runtime vs build/test, deduped by coordinate chain).
  - Tenant environment flags (prod/stage toggles) with optional blast radius summary.
  - Service-side clamps: max 500 timeline entries, 200 dependency paths, with client-provided toggles for env/blast data.
  - `AddSbomContextHttpClient(...)` registers the typed HTTP client that calls `/v1/sbom/context`, while `NullSbomContextClient` remains the safe default for environments that have not yet exposed the SBOM service.

  **Sample configuration** (wire real SBOM base URL + API key):

  ```csharp
  services.AddSbomContextHttpClient(options =>
  {
      options.BaseAddress = new Uri("https://sbom-service.internal");
      options.Endpoint = "/v1/sbom/context";
      options.ApiKey = configuration["SBOM_SERVICE_API_KEY"];
      options.UserAgent = "stellaops-advisoryai/1.0";
      options.Tenant = configuration["TENANT_ID"];
  });

  services.AddAdvisoryPipeline();
  ```

  After configuration, issue a smoke request (e.g., `ISbomContextRetriever.RetrieveAsync`) during deployment validation to confirm end-to-end connectivity and credentials before enabling Advisory AI endpoints.

Retriever requests and results are trimmed/normalized before hashing; metadata (counts, provenance keys) is returned for downstream guardrails. Unit coverage ensures deterministic ordering and flag handling.

All context references include `content_hash` and `source_id` enabling verifiable citations.

## 4) Guardrails

- Prompt templates enforce structure: summary, conflicts, remediation, references.
- Response validator ensures:
  - No hallucinated advisories (every fact must map to input context).
  - Citations follow `[n]` indexing referencing actual sources.
  - Remediation suggestions only cite policy-approved sources (fixed versions, vendor hotfixes).
- Moderation/PII filters prevent leaking secrets; responses failing validation are rejected and logged.
- Pre-flight guardrails redact secrets (AWS keys, generic API tokens, PEM blobs), block "ignore previous instructions"-style prompt injection attempts, enforce citation presence, and cap prompt payload length (default 16 kB). Guardrail outcomes and redaction counts surface via `advisory_guardrail_blocks` / `advisory_outputs_stored` metrics.

## 5) Deterministic tooling

- **Version comparators** — offline semantic version + RPM EVR parsers with range evaluators. Supports chained constraints (`>=`, `<=`, `!=`) used by remediation advice and blast radius calcs.
  - Registered via `AddAdvisoryDeterministicToolset` for reuse across orchestrator, CLI, and services.
- **Orchestration pipeline** — see `orchestration-pipeline.md` for prerequisites, task breakdown, and cross-guild responsibilities before wiring the execution flows.
- **Planned extensions** — NEVRA/EVR comparators, ecosystem-specific normalisers, dependency chain scorers (AIAI-31-003 scope).
- Exposed via internal interfaces to allow orchestrator/toolchain reuse; all helpers stay side-effect free and deterministic for golden testing.

## 6) Output persistence

- Cached artefacts stored in `advisory_ai_outputs` with fields:
  - `output_hash` (sha256 of JSON response).
  - `input_digest` (hash of context pack).
  - `summary`, `conflicts`, `remediation`, `citations`.
  - `generated_at`, `model_id`, `profile` (Sovereign/FIPS etc.).
  - `signatures` (optional DSSE if run in deterministic mode).
- Offline bundle format contains `summary.md`, `citations.json`, `context_manifest.json`, `signatures/`.

## 7) Profiles & sovereignty

- **Profiles:** `default`, `fips-local` (FIPS-compliant local model), `gost-local`, `cloud-openai` (optional, disabled by default). Each profile defines allowed models, key management, and telemetry endpoints.
- **CryptoProfile/RootPack integration:** generated artefacts can be signed using configured CryptoProfile to satisfy procurement/trust requirements.

## 8) APIs

- `POST /api/v1/advisory/{task}` — executes Summary/Conflict/Remediation pipeline (`task` ∈ `summary|conflict|remediation`). Requests accept `{advisoryKey, artifactId?, policyVersion?, profile, preferredSections?, forceRefresh}` and return sanitized prompt payloads, citations, guardrail metadata, provenance hash, and cache hints.
- `GET /api/v1/advisory/outputs/{cacheKey}?taskType=SUMMARY&profile=default` — retrieves cached artefacts for downstream consumers (Console, CLI, Export Center). Guardrail state and provenance hash accompany results.

All endpoints accept `profile` parameter (default `fips-local`) and return `output_hash`, `input_digest`, and `citations` for verification.

## 9) Observability

- Metrics: `advisory_ai_requests_total{profile,type}`, `advisory_ai_latency_seconds`, `advisory_ai_validation_failures_total`.
- Logs: include `output_hash`, `input_digest`, `profile`, `model_id`, `tenant`, `artifacts`. Sensitive context is not logged.
- Traces: spans for retrieval, prompt assembly, model inference, validation, cache write.

## 10) Operational controls

- Feature flags per tenant (`ai.summary.enabled`, `ai.remediation.enabled`).
- Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
- Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.

## 11) Hosting surfaces

- **WebService** — exposes `/v1/advisory-ai/pipeline/{task}` to materialise plans and enqueue execution messages.
- **Worker** — background service draining the advisory pipeline queue (file-backed stub) pending integration with shared transport.
- Both hosts register `AddAdvisoryAiCore`, which wires the SBOM context client, deterministic toolset, pipeline orchestrator, and queue metrics.
- SBOM base address + tenant metadata are configured via `AdvisoryAI:SbomBaseAddress` and propagated through `AddSbomContext`.

## 12) QA harness & determinism (Sprint 110 refresh)

- **Injection fixtures:** `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/prompt-injection-fixtures.txt` drives `AdvisoryGuardrailInjectionTests`, ensuring blocked phrases (`ignore previous instructions`, `override the system prompt`, etc.) are rejected with redaction counters, preventing prompt-injection regressions.
- **Golden prompts:** `summary-prompt.json` now pairs with `conflict-prompt.json`; `AdvisoryPromptAssemblerTests` load both to enforce deterministic JSON payloads across task types and verify vector preview truncation (600 characters + ellipsis) keeps prompts under the documented perf ceiling.
- **Plan determinism:** `AdvisoryPipelineOrchestratorTests` shuffle structured/vector/SBOM inputs and assert cache keys + metadata remain stable, proving that seeded plan caches stay deterministic even when retrievers emit out-of-order results.
- **Execution telemetry:** `AdvisoryPipelineExecutorTests` exercise partial citation coverage (target ≥0.5 when only half the structured chunks are cited) so `advisory_ai_citation_coverage_ratio` reflects real guardrail quality.
- **Plan cache stability:** `AdvisoryPlanCacheTests` now seed the in-memory cache with a fake time provider to confirm TTL refresh when plans are replaced, guaranteeing reproducible eviction under air-gapped runs.

## 13) Deployment profiles, scaling, and remote inference

- **Local inference containers.** `advisory-ai-web` exposes the API/plan cache endpoints while `advisory-ai-worker` drains the queue and executes prompts. Both containers mount the same RWX volume that hosts three deterministic paths: `/var/lib/advisory-ai/queue`, `/var/lib/advisory-ai/plans`, `/var/lib/advisory-ai/outputs`. Compose bundles create named volumes (`advisory-ai-{queue,plans,outputs}`) and the Helm chart mounts the `stellaops-advisory-ai-data` PVC so web + worker remain in lockstep.
- **Remote inference toggle.** Set `AdvisoryAI:Inference:Mode` (env: `ADVISORYAI__AdvisoryAI__Inference__Mode`) to `Remote` when you want prompts to be executed by an external inference tier. Provide `AdvisoryAI:Inference:Remote:BaseAddress` and, optionally, `...:ApiKey`. When remote calls fail the executor falls back to the sanitized prompt and sets `inference.fallback_*` metadata so CLI/Console surface a warning.
- **Scalability.** Start with 1 web replica + 1 worker for up to ~10 requests/minute. For higher throughput, scale `advisory-ai-worker` horizontally; each worker is CPU-bound (2 vCPU / 4 GiB RAM recommended) while the web front end is I/O-bound (1 vCPU / 1 GiB). Because the queue/plan/output stores are content-addressed files, ensure the shared volume delivers ≥500 IOPS and <5 ms latency; otherwise queue depth will lag.
- **Offline & air-gapped stance.** The Compose/Helm manifests avoid external network calls by default and the Offline Kit now publishes the `advisory-ai-web` and `advisory-ai-worker` images alongside their SBOMs/provenance. Operators can rehydrate the RWX volume from the kit to pre-prime cache directories before enabling the service.