feat: Update Sprint 110 documentation and enhance Advisory AI tests for determinism and mTLS validation

2025-11-08 23:28:41 +02:00
parent ae69b1a8a1
commit d71c81e45d
9 changed files with 395 additions and 19 deletions
--- a/docs/implplan/SPRINT_110_ingestion_evidence.md
+++ b/docs/implplan/SPRINT_110_ingestion_evidence.md
@@ -8,7 +8,7 @@ Active items only. Completed/historic work now resides in docs/implplan/archived
  - 2025-11-04: AIAI-31-002 and AIAI-31-003 shipped with deterministic SBOM context client wiring (`AddSbomContext` typed HTTP client) and toolset integration; WebService/Worker now invoke the orchestrator with SBOM-backed simulations and emit initial metrics.
  - 2025-11-03: AIAI-31-002 landed the configurable HTTP client + DI defaults; retriever now resolves data via `/v1/sbom/context`, retaining a null fallback until SBOM service ships.
  - 2025-11-03: Follow-up: SBOM guild to deliver base URL/API key and run an Advisory AI smoke retrieval once SBOM-AIAI-31-001 endpoints are live.
-  - 2025-11-08: AIAI-31-009 moved to DOING – building the QA harness (injection fixtures, golden/property/perf tests) plus documenting deterministic cache guarantees before release.
+  - 2025-11-08: AIAI-31-009 marked DONE – injection harness + dual golden prompts + plan-cache determinism tests landed; perf memo added to Advisory AI architecture, `dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj --no-build` green.
 - **Concelier** – CONCELIER-CORE-AOC-19-004 is the only in-flight Concelier item; air-gap, console, attestation, and Link-Not-Merge tasks remain TODO, and several connector upgrades still carry overdue October due dates.
 - **Excititor** – Excititor WebService, console, policy, and observability tracks are all TODO and hinge on Link-Not-Merge schema delivery plus trust-provenance connectors (SUSE/Ubuntu) progressing in section 110.C.
 - **Mirror** – Mirror Creator track (MIRROR-CRT-56-001 through MIRROR-CRT-58-002) has not started; DSSE signing, OCI bundle, and scheduling integrations depend on the deterministic bundle assembler landing first.
--- a/docs/modules/advisory-ai/architecture.md
+++ b/docs/modules/advisory-ai/architecture.md
@@ -131,9 +131,17 @@ All endpoints accept `profile` parameter (default `fips-local`) and return `outp
 - Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
 - Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.

-## 11) Hosting surfaces
-
- **WebService** — exposes `/v1/advisory-ai/pipeline/{task}` to materialise plans and enqueue execution messages.
- **Worker** — background service draining the advisory pipeline queue (file-backed stub) pending integration with shared transport.
- Both hosts register `AddAdvisoryAiCore`, which wires the SBOM context client, deterministic toolset, pipeline orchestrator, and queue metrics.
- SBOM base address + tenant metadata are configured via `AdvisoryAI:SbomBaseAddress` and propagated through `AddSbomContext`.
+## 11) Hosting surfaces
+
+- **WebService** — exposes `/v1/advisory-ai/pipeline/{task}` to materialise plans and enqueue execution messages.
+- **Worker** — background service draining the advisory pipeline queue (file-backed stub) pending integration with shared transport.
+- Both hosts register `AddAdvisoryAiCore`, which wires the SBOM context client, deterministic toolset, pipeline orchestrator, and queue metrics.
+- SBOM base address + tenant metadata are configured via `AdvisoryAI:SbomBaseAddress` and propagated through `AddSbomContext`.
+
+## 12) QA harness & determinism (Sprint 110 refresh)
+
+- **Injection fixtures:** `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/prompt-injection-fixtures.txt` drives `AdvisoryGuardrailInjectionTests`, ensuring blocked phrases (`ignore previous instructions`, `override the system prompt`, etc.) are rejected with redaction counters, preventing prompt-injection regressions.
+- **Golden prompts:** `summary-prompt.json` now pairs with `conflict-prompt.json`; `AdvisoryPromptAssemblerTests` load both to enforce deterministic JSON payloads across task types and verify vector preview truncation (600 characters + ellipsis) keeps prompts under the documented perf ceiling.
+- **Plan determinism:** `AdvisoryPipelineOrchestratorTests` shuffle structured/vector/SBOM inputs and assert cache keys + metadata remain stable, proving that seeded plan caches stay deterministic even when retrievers emit out-of-order results.
+- **Execution telemetry:** `AdvisoryPipelineExecutorTests` exercise partial citation coverage (target ≥0.5 when only half the structured chunks are cited) so `advisory_ai_citation_coverage_ratio` reflects real guardrail quality.
+- **Plan cache stability:** `AdvisoryPlanCacheTests` now seed the in-memory cache with a fake time provider to confirm TTL refresh when plans are replaced, guaranteeing reproducible eviction under air-gapped runs.