feat: Enhance Authority Identity Provider Registry with Bootstrap Capability

- Added support for bootstrap providers in AuthorityIdentityProviderRegistry. - Introduced a new property for bootstrap providers and updated AggregateCapabilities. - Updated relevant methods to handle bootstrap capabilities during provider registration. feat: Introduce Sealed Mode Status in OpenIddict Handlers - Added SealedModeStatusProperty to AuthorityOpenIddictConstants. - Enhanced ValidateClientCredentialsHandler, ValidatePasswordGrantHandler, and ValidateRefreshTokenGrantHandler to validate sealed mode evidence. - Implemented logic to handle airgap seal confirmation requirements. feat: Update Program Configuration for Sealed Mode - Registered IAuthoritySealedModeEvidenceValidator in Program.cs. - Added logging for bootstrap capabilities in identity provider plugins. - Implemented checks for bootstrap support in API endpoints. chore: Update Tasks and Documentation - Marked AUTH-MTLS-11-002 as DONE in TASKS.md. - Updated documentation to reflect changes in sealed mode and bootstrap capabilities. fix: Improve CLI Command Handlers Output - Enhanced output formatting for command responses and prompts in CommandHandlers.cs. feat: Extend Advisory AI Models - Added Response property to AdvisoryPipelineOutputModel for better output handling. fix: Adjust Concelier Web Service Authentication - Improved JWT token handling in Concelier Web Service to ensure proper token extraction and logging. test: Enhance Web Service Endpoints Tests - Added detailed logging for authentication failures in WebServiceEndpointsTests. - Enabled PII logging for better debugging of authentication issues. feat: Introduce Air-Gap Configuration Options - Added AuthorityAirGapOptions and AuthoritySealedModeOptions to StellaOpsAuthorityOptions. - Implemented validation logic for air-gap configurations to ensure proper setup.
2025-11-09 12:18:14 +02:00
parent d71c81e45d
commit ba4c935182
68 changed files with 2142 additions and 291 deletions
--- a/docs/modules/advisory-ai/README.md
+++ b/docs/modules/advisory-ai/README.md
@@ -26,6 +26,11 @@ Advisory AI is the retrieval-augmented assistant that synthesizes advisory and V
 - Redaction policies validated against security/LLM guardrail tests.
 - Guardrail behaviour, blocked phrases, and operational alerts are detailed in `/docs/security/assistant-guardrails.md`.

+## Deployment & configuration
+- **Containers:** `advisory-ai-web` fronts the API/cache while `advisory-ai-worker` drains the queue and executes prompts. Both containers mount a shared RWX volume providing `/var/lib/advisory-ai/{queue,plans,outputs}`.
+- **Remote inference toggle:** Set `ADVISORYAI__AdvisoryAI__Inference__Mode=Remote` to send sanitized prompts to an external inference tier. Provide `ADVISORYAI__AdvisoryAI__Inference__Remote__BaseAddress` (and optional `...ApiKey`) to complete the circuit; failures fall back to the sanitized prompt and surface `inference.fallback_*` metadata.
+- **Helm/Compose:** Bundled manifests wire the SBOM base address, queue/plan/output directories, and inference options via the `AdvisoryAI` configuration section. Helm expects a PVC named `stellaops-advisory-ai-data`. Compose creates named volumes so the worker and web instances share deterministic state.
+
 ## CLI usage
 - `stella advise run <summary|conflict|remediation> --advisory-key <id> [--artifact-id id] [--artifact-purl purl] [--policy-version v] [--profile profile] [--section name] [--force-refresh] [--timeout seconds]`
  - Requests an advisory plan from the web service, enqueues execution, then polls for the generated output (default wait 120 s, single check if `--timeout 0`).
--- a/docs/modules/advisory-ai/architecture.md
+++ b/docs/modules/advisory-ai/architecture.md
@@ -145,3 +145,10 @@ All endpoints accept `profile` parameter (default `fips-local`) and return `outp
 - **Plan determinism:** `AdvisoryPipelineOrchestratorTests` shuffle structured/vector/SBOM inputs and assert cache keys + metadata remain stable, proving that seeded plan caches stay deterministic even when retrievers emit out-of-order results.
 - **Execution telemetry:** `AdvisoryPipelineExecutorTests` exercise partial citation coverage (target ≥0.5 when only half the structured chunks are cited) so `advisory_ai_citation_coverage_ratio` reflects real guardrail quality.
 - **Plan cache stability:** `AdvisoryPlanCacheTests` now seed the in-memory cache with a fake time provider to confirm TTL refresh when plans are replaced, guaranteeing reproducible eviction under air-gapped runs.
+
+## 13) Deployment profiles, scaling, and remote inference
+
+- **Local inference containers.** `advisory-ai-web` exposes the API/plan cache endpoints while `advisory-ai-worker` drains the queue and executes prompts. Both containers mount the same RWX volume that hosts three deterministic paths: `/var/lib/advisory-ai/queue`, `/var/lib/advisory-ai/plans`, `/var/lib/advisory-ai/outputs`. Compose bundles create named volumes (`advisory-ai-{queue,plans,outputs}`) and the Helm chart mounts the `stellaops-advisory-ai-data` PVC so web + worker remain in lockstep.
+- **Remote inference toggle.** Set `AdvisoryAI:Inference:Mode` (env: `ADVISORYAI__AdvisoryAI__Inference__Mode`) to `Remote` when you want prompts to be executed by an external inference tier. Provide `AdvisoryAI:Inference:Remote:BaseAddress` and, optionally, `...:ApiKey`. When remote calls fail the executor falls back to the sanitized prompt and sets `inference.fallback_*` metadata so CLI/Console surface a warning.
+- **Scalability.** Start with 1 web replica + 1 worker for up to ~10 requests/minute. For higher throughput, scale `advisory-ai-worker` horizontally; each worker is CPU-bound (2 vCPU / 4 GiB RAM recommended) while the web front end is I/O-bound (1 vCPU / 1 GiB). Because the queue/plan/output stores are content-addressed files, ensure the shared volume delivers ≥500 IOPS and <5 ms latency; otherwise queue depth will lag.
+- **Offline & air-gapped stance.** The Compose/Helm manifests avoid external network calls by default and the Offline Kit now publishes the `advisory-ai-web` and `advisory-ai-worker` images alongside their SBOMs/provenance. Operators can rehydrate the RWX volume from the kit to pre-prime cache directories before enabling the service.