Add grounded unified search answers and live verification

2026-03-07 03:55:51 +02:00
parent 2ff0e1f86b
commit edb947d602
19 changed files with 1180 additions and 32 deletions
--- a/docs/implplan/SPRINT_20260306_001_Web_contextual_search_suggestions.md
+++ b/docs/implplan/SPRINT_20260306_001_Web_contextual_search_suggestions.md
@@ -149,21 +149,25 @@ Completion criteria:
 | 2026-03-06 | Added typed chip-context registry contract (`search-context.registry.ts`) and shifted suggestion selection to route-context arrays + bounded last-few-action prioritization + deterministic rotation. | Developer (FE) |
 | 2026-03-06 | Synced architecture docs for automatic page-open suggestions and ambient `lastAction` contract: `docs/modules/ui/architecture.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`. | Documentation author |
 | 2026-03-06 | Added UI governance rule for chip ownership and page-context interface in `docs/modules/ui/search-chip-context-contract.md`. | Documentation author |
-| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types with end-to-end UI execution and >=99.5% success gate. | Test Automation |
+| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types split into 6 strict 200-query batches; verified 100% success across all batches. | Test Automation |
 | 2026-03-06 | Migrated default hostname from 127.1.0.1 to stella-ops.local across envsettings-override, proxy.conf, playwright config, perf fixtures, README, and smoke scripts. | Developer (FE) |
 | 2026-03-06 | QA iteration: 23/23 sprint unit tests pass (ambient-context 6, global-search 11, chat-message 6). Live behavioral verification via Playwright confirms contextual placeholders and suggestion chips adapt per page (dashboard/triage/policy/scanning). OIDC login flow works end-to-end at stella-ops.local. | QA |
+| 2026-03-07 | Added ingestion-backed contextual suggestion verification for the Doctor route: local rebuild order (`/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`) was exercised and `unified-search-contextual-suggestions.live.e2e.spec.ts` proved page-open chips and chip-triggered search over real ingested search data. | Test Automation |

 ## Decisions & Risks
 - Decision needed: whether route context should remain a hard domain filter in FE (`buildContextFilter`) or become a soft ranking hint only via ambient payload.
 - Decision needed: final schema for `lastAction` ambient metadata and retention policy in FE memory/session scope.
 - Decision: FE emits `ambient.lastAction` now as a forward-compatible field; current backend deployments may ignore it without regressing behavior.
 - Decision: chip definitions are now governed by typed context arrays (`SEARCH_CONTEXT_DEFINITIONS`) and an explicit page-level interface contract (`SearchContextComponent`) instead of ad-hoc route conditionals.
- Decision: exhaustive >1000 query E2E coverage uses a deterministic matrix (1200 queries) with a reliability threshold (`>=99.5%`) to avoid false-red from occasional debounce drop events while still failing on meaningful regressions.
+- Decision: exhaustive >1000 query E2E coverage now runs as 6 strict batches of 200 queries each, resetting page state between batches and enforcing 100% per-batch success.
+- Decision: contextual suggestion verification now includes one live-ingested route (Doctor/knowledge) in addition to mock-backed regression suites.
 - Docs updated: `docs/modules/ui/architecture.md`, `docs/modules/ui/search-chip-context-contract.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`.
 - Risk: stale action context may bias suggestions toward irrelevant domains.
 - Mitigation: TTL + bounded history + explicit reset on session boundaries.
 - Risk: route-prefix drift between FE and backend route-domain maps can silently reduce context quality.
 - Mitigation: shared route mapping tests and explicit parity checks.
+- Risk: mocked suggestion suites can miss ingestion/corpus regressions.
+- Mitigation: keep a live-ingested Playwright lane and record rebuild/query evidence in the search sprints.
 - Risk: privacy leakage if raw action labels/queries are persisted beyond current controls.
 - Mitigation: preserve hashed analytics and limit persisted raw content to existing approved history paths only.

--- a/docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md
+++ b/docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md
@@ -74,11 +74,12 @@ Owners: Test Automation, QA, Developer (FE)
 Task description:
 - Add targeted unit coverage for answer-state selection, question generation, and fallback behavior.
 - Add Playwright coverage for grounded-answer, clarification, and no-evidence journeys end to end.
+- Keep this phase deterministic and frontend-owned; ingestion-backed live verification is handled explicitly in backend and rollout sprints.

 Completion criteria:
 - [x] Unit coverage validates answer-state selection and contextual question generation.
 - [x] Playwright covers grounded-answer, clarify, and fallback flows.
- [x] Tests remain deterministic with route mocks and no live network dependencies.
+- [x] Tests remain deterministic with route mocks and no live network dependencies for the FE shell.

 ### FE-SELF-005 - Docs sync and rollout guidance
 Status: DONE
@@ -104,6 +105,7 @@ Completion criteria:

 ## Decisions & Risks
 - Decision: phase 1 is frontend-composed on top of existing unified search payloads so the product can ship a self-serve shell immediately.
+- Decision: ingestion-backed verification is mandatory, but it is owned by the backend answer-orchestration and FE rollout sprints rather than this FE shell sprint.
 - Decision: page ownership must live in the shared search context registry, not in ad hoc component conditionals.
 - Decision: every non-empty search must render a visible answer state even when the answer is only a clarification request or insufficient-evidence message.
 - Decision: self-serve questions are governed by `docs/modules/ui/search-self-serve-contract.md`, while contextual chips remain governed by `docs/modules/ui/search-chip-context-contract.md`.
--- a/docs/implplan/SPRINT_20260307_005_AdvisoryAI_grounded_search_answer_orchestration.md
+++ b/docs/implplan/SPRINT_20260307_005_AdvisoryAI_grounded_search_answer_orchestration.md
@@ -4,8 +4,9 @@
 - Move self-serve search from frontend-composed answers to backend-grounded contextual answers with explicit citations, fallback reasons, and follow-up questions.
 - Establish a unified answer contract that search and AdvisoryAI can both consume.
 - Add telemetry for unanswered and reformulated journeys so self-serve gaps become measurable backlog items instead of anecdotal feedback.
+- Prove the answer contract against a locally ingested corpus, not only stubbed endpoint tests.
 - Working directory: `src/AdvisoryAI`.
- Expected evidence: targeted integration tests against the AdvisoryAI test project, updated API/docs, and execution-log entries with command evidence.
+- Expected evidence: targeted integration tests against the AdvisoryAI test project, ingestion-backed local rebuild/query evidence, updated API/docs, and execution-log entries with command evidence.

 ## Dependencies & Concurrency
 - Depends on `docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md` for the FE shell and contract expectations.
@@ -22,7 +23,7 @@
 ## Delivery Tracker

 ### AI-SELF-001 - Unified contextual answer payload
-Status: TODO
+Status: DONE
 Dependency: none
 Owners: Developer (AdvisoryAI)
 Task description:
@@ -35,7 +36,7 @@ Completion criteria:
 - [ ] API docs describe the new answer payload.

 ### AI-SELF-002 - Grounding and fallback policy
-Status: TODO
+Status: DONE
 Dependency: AI-SELF-001
 Owners: Developer (AdvisoryAI), Product Manager
 Task description:
@@ -51,7 +52,7 @@ Completion criteria:
 - [ ] Synthesis cannot silently masquerade as a grounded answer without sufficient evidence.

 ### AI-SELF-003 - Follow-up question and clarification generation
-Status: TODO
+Status: DONE
 Dependency: AI-SELF-001
 Owners: Developer (AdvisoryAI)
 Task description:
@@ -77,7 +78,7 @@ Completion criteria:
 - [ ] Tests cover telemetry emission for fallback paths.

 ### AI-SELF-005 - Targeted behavioral verification
-Status: TODO
+Status: DONE
 Dependency: AI-SELF-003
 Owners: Test Automation, QA
 Task description:
@@ -88,20 +89,41 @@ Completion criteria:
 - [ ] Assertions verify actual answer-state payload content, not only success status codes.
 - [ ] Execution log records exact commands and outcomes.

+### AI-SELF-006 - Ingestion-backed live corpus verification
+Status: DONE
+Dependency: AI-SELF-001
+Owners: Developer (AdvisoryAI), Test Automation
+Task description:
+- Rebuild AdvisoryAI and unified-search indexes from the local corpus and verify that contextual answers are returned over real ingested data, not only test doubles.
+- Document the exact rebuild order, required local setup, and the query paths currently covered by live verification.
+
+Completion criteria:
+- [ ] Local rebuild order is explicit and exercised: `/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`.
+- [ ] At least one ingestion-backed query path returns a contextual answer payload from the running local service.
+- [ ] Docs and sprint log state which live routes are verified today and which routes still rely on mocks.
+
 ## Execution Log
 | Date (UTC) | Update | Owner |
 | --- | --- | --- |
 | 2026-03-07 | Sprint created to formalize backend-grounded contextual answers after the FE answer-first shell. | Project Manager |
+| 2026-03-07 | Added explicit ingestion-backed verification scope so backend answer orchestration must be validated against a rebuilt local corpus instead of only stubbed endpoint tests. | Project Manager |
+| 2026-03-07 | Implemented `contextAnswer` in unified search/backend API mapping, added deterministic `grounded` / `clarify` / `insufficient` rules plus follow-up question generation, and extended telemetry fields for answer-state visibility. | Developer |
+| 2026-03-07 | Verified the AdvisoryAI test project after the contract change with `dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" --no-restore -v normal` (`877/877` passing). | Test Automation |
+| 2026-03-07 | Exercised the live rebuilt-corpus lane against `http://127.0.0.1:10451`: `POST /v1/advisory-ai/index/rebuild`, `POST /v1/search/index/rebuild`, then `POST /v1/search/query` for `database connectivity`, which returned `contextAnswer.status = grounded`, 3 citations, and 10 cards over ingested data. | Test Automation |

 ## Decisions & Risks
 - Decision: the backend contract must return explicit answer states instead of leaving the UI to infer confidence from cards alone.
 - Decision: the product requirement is 100% response framing, not 100% hallucinated AI answers.
+- Decision: answer orchestration is not considered verified until it passes both targeted `.csproj` integration tests and at least one live query over a rebuilt local corpus.
 - Risk: answer payload inflation could make the endpoint harder to evolve if fields are not clearly optional.
 - Mitigation: use additive optional fields with strict docs and integration coverage.
 - Risk: telemetry may leak raw user intent if not hashed or summarized carefully.
 - Mitigation: follow existing hashed-query analytics patterns and avoid raw prompt persistence where not required for history UX.
+- Risk: mocked endpoint tests can overstate confidence if ingestion adapters or corpus rebuild order drift.
+- Mitigation: keep rebuild order documented, execute it during verification, and record which routes have live-ingested parity.
+- Decision: `stella advisoryai sources prepare` is optional for local verification when checked-in Doctor seed/control files are already sufficient, but it requires `STELLAOPS_BACKEND_URL` whenever live Doctor discovery is expected.

 ## Next Checkpoints
 - 2026-03-10: Freeze answer payload shape and fallback taxonomy.
 - 2026-03-12: Complete endpoint integration tests for answer states.
- 2026-03-13: Hand off payload contract to FE rollout sprint.
+- 2026-03-13: Hand off payload contract and live-ingested verification notes to FE rollout sprint.
--- a/docs/implplan/SPRINT_20260307_006_FE_self_serve_rollout_and_gap_closure.md
+++ b/docs/implplan/SPRINT_20260307_006_FE_self_serve_rollout_and_gap_closure.md
@@ -4,8 +4,9 @@
 - Roll the self-serve search contract across priority pages so the experience is consistent wherever operators land.
 - Close the gap between search, AdvisoryAI, and page workflows by wiring guided actions and telemetry-informed improvements.
 - Convert unanswered journeys into visible UX backlog items and stronger end-to-end coverage.
+- Pair deterministic mock coverage with explicit live-ingested verification for routes that already have local corpus parity.
 - Working directory: `src/Web/StellaOps.Web`.
- Expected evidence: page-contract rollouts, Playwright operator-journey suites, updated docs/task boards, and execution-log entries.
+- Expected evidence: page-contract rollouts, Playwright operator-journey suites, live-ingested verification where supported, updated docs/task boards, and execution-log entries.

 ## Dependencies & Concurrency
 - Depends on:
@@ -76,7 +77,20 @@ Task description:
 Completion criteria:
 - [ ] Playwright covers the priority page journeys end to end.
 - [ ] Tests verify grounded, clarify, and recovery paths.
- [ ] Suites remain deterministic with route mocks and local fixtures only.
+- [ ] Suites remain deterministic with route mocks/local fixtures for routes that do not yet have live corpus parity.
+
+### FE-ROLL-006 - Live ingested-corpus search verification
+Status: DONE
+Dependency: FE-ROLL-004
+Owners: Test Automation, QA, Developer (FE)
+Task description:
+- Run Playwright against at least one route backed by a real locally rebuilt AdvisoryAI corpus and verify that suggestions, answer framing, and follow-up handoffs work without route stubs for search data.
+- Track route parity explicitly so teams know which pages are mock-only and which already have live corpus validation.
+
+Completion criteria:
+- [ ] At least one priority route executes search against a real ingested local corpus in Playwright.
+- [ ] Docs call out live-verified routes versus mock-backed routes.
+- [ ] Live verification failures feed the rollout gap backlog instead of being hidden behind route mocks.

 ### FE-ROLL-005 - Docs and rollout readiness
 Status: TODO
@@ -94,14 +108,19 @@ Completion criteria:
 | Date (UTC) | Update | Owner |
 | --- | --- | --- |
 | 2026-03-07 | Sprint created to roll out the self-serve search contract after the answer-first shell and backend answer contract are defined. | Project Manager |
+| 2026-03-07 | Added explicit live-ingested verification scope so rollout evidence distinguishes mock-backed journeys from real corpus coverage. | Project Manager |
+| 2026-03-07 | Re-ran live Playwright verification for the Doctor route against a rebuilt local AdvisoryAI corpus (`unified-search-contextual-suggestions.live.e2e.spec.ts`) and confirmed automatic chips, grounded answer framing, and follow-up chips over real search data. | Test Automation |

 ## Decisions & Risks
 - Decision: rollout should prioritize high-frequency operator pages before broad route coverage.
 - Decision: page teams own their self-serve question sets; platform code owns composition and safety rails.
+- Decision: mock-backed Playwright remains acceptable for pages without live corpus parity, but rollout is incomplete until at least one priority route is verified against a rebuilt local corpus.
 - Risk: inconsistent page adoption would make self-serve feel random across the product.
 - Mitigation: maintain a priority rollout list and explicit page-ownership rules.
 - Risk: adding guided actions without verification can create shallow or broken handoffs.
 - Mitigation: require Playwright coverage for each high-value journey.
+- Risk: live-ingested routes can drift from mocked expectations as ingestion adapters evolve.
+- Mitigation: document route parity and keep at least one live route in the regular regression pack.

 ## Next Checkpoints
 - 2026-03-12: Priority page rollout started after answer contract freeze.