Verify live search suggestions against ingested corpus

2026-03-07 18:52:18 +02:00
parent 9d3bed1d0e
commit 820fb4ec25
4 changed files with 324 additions and 16 deletions
--- a/docs/implplan/SPRINT_20260307_021_FE_live_search_suggestion_reliability_matrix.md
+++ b/docs/implplan/SPRINT_20260307_021_FE_live_search_suggestion_reliability_matrix.md
@@ -20,7 +20,7 @@
 ## Delivery Tracker

 ### QA-ZL-001 - Add live corpus preflight and rebuild checks
-Status: TODO
+Status: DONE
 Dependency: none
 Owners: Test Automation
 Task description:
@@ -28,12 +28,12 @@ Task description:
 - Fail with explicit setup diagnostics when the corpus is empty or stale instead of producing misleading UI failures.

 Completion criteria:
- [ ] The live suite checks rebuild/readiness before suggestion assertions.
- [ ] Failure output distinguishes ingestion failure from UI failure.
- [ ] Setup docs reference compiled CLI and HTTP rebuild fallbacks.
+- [x] The live suite checks rebuild/readiness before suggestion assertions.
+- [x] Failure output distinguishes ingestion failure from UI failure.
+- [x] Setup docs reference compiled CLI and HTTP rebuild fallbacks.

 ### QA-ZL-002 - Prove every surfaced suggestion succeeds
-Status: TODO
+Status: DONE
 Dependency: QA-ZL-001
 Owners: Test Automation
 Task description:
@@ -41,32 +41,37 @@ Task description:
 - Include pages that rely on current-scope weighting and overflow fallback.

 Completion criteria:
- [ ] The live suite iterates through each surfaced suggestion on the covered pages.
- [ ] Every rendered suggestion produces a visible non-dead-end state.
- [ ] Previously failing suggestion paths are covered explicitly.
+- [x] The live suite iterates through each surfaced suggestion on the covered pages.
+- [x] Every rendered suggestion produces a visible non-dead-end state.
+- [x] Previously failing suggestion paths are covered explicitly.

 ### QA-ZL-003 - Verify search-to-chat consolidation
-Status: TODO
+Status: DONE
 Dependency: QA-ZL-002
 Owners: Test Automation
 Task description:
 - Verify the compact chat launcher and answer-panel handoff preserve query, page context, and evidence after the search redesign.

 Completion criteria:
- [ ] Search is the tested primary entry in all covered flows.
- [ ] AdvisoryAI opens as a secondary deep-dive from search with inherited context.
- [ ] Execution log records the final full-pack commands and outcomes.
+- [x] Search is the tested primary entry in all covered flows.
+- [x] AdvisoryAI opens as a secondary deep-dive from search with inherited context.
+- [x] Execution log records the final full-pack commands and outcomes.

 ## Execution Log
 | Date (UTC) | Update | Owner |
 | --- | --- | --- |
 | 2026-03-07 | Sprint created for live corpus-backed suggestion reliability and zero-learning search verification. | Project Manager |
+| 2026-03-07 | Reproduced the user-facing failure against `http://127.1.0.44`: health was up but `POST /v1/advisory-ai/index/rebuild` returned `documentCount=0`, `chunkCount=0`, and `doctorProjectionCount=0`, so suggestion preflight now treats empty-corpus services as setup failures instead of UI regressions. | Test Automation |
+| 2026-03-07 | Prepared sources against the repo-controlled service, rebuilt both indexes, and verified live query `database connectivity` returned `contextAnswer.status=grounded` with knowledge cards and citations. | Test Automation |
+| 2026-03-07 | Ran `npx playwright test tests/e2e/unified-search-contextual-suggestions.live.e2e.spec.ts --config playwright.config.ts` against `http://127.0.0.1:10451`; result `5/5` passed covering chip viability, every surfaced suggestion, result-open follow-up chips, and Ask-AdvisoryAI handoff. | Test Automation |

 ## Decisions & Risks
 - Decision: live reliability gates are required because static mocks cannot prove suggestion viability against real corpora.
+- Decision: a healthy service with an empty corpus is an ingestion/setup failure, not a passing baseline; live E2E must fail before UI assertions in that case.
 - Risk: local environments may have partially ingested or empty corpora, especially in Doctor/knowledge projections.
 - Mitigation: add explicit corpus preflight and rebuild guidance so the suite fails with actionable diagnostics.
+- Mitigation: use a repo-controlled local service (`http://127.0.0.1:10451`) with `advisoryai sources prepare`, `POST /v1/advisory-ai/index/rebuild`, and `POST /v1/search/index/rebuild` before running the live suite.

 ## Next Checkpoints
- 2026-03-09: Land live corpus preflight before broadening the suggestion matrix.
- 2026-03-10: Run the final live suggestion pack and capture exact outcomes in the execution log.
+- 2026-03-09: Broaden live coverage beyond Doctor once findings/policy/VEX ingestion parity is available.
+- 2026-03-10: Fold the live reliability lane into the consolidated zero-learning search redesign phases.
--- a/docs/modules/advisory-ai/knowledge-search.md
+++ b/docs/modules/advisory-ai/knowledge-search.md
@@ -403,7 +403,8 @@ Current live verification coverage:
 - Rebuild order exercised against a running local service: `POST /v1/advisory-ai/index/rebuild` then `POST /v1/search/index/rebuild`
 - Verified live query: `database connectivity`
 - Verified live outcome: response includes `contextAnswer.status = grounded`, citations, and entity cards over ingested data
- Verified live suggestion lane: the Doctor-page `database connectivity` chip remains a viable query after rebuild and is exercised by `src/Web/StellaOps.Web/tests/e2e/unified-search-contextual-suggestions.live.e2e.spec.ts`
+- Verified live suggestion lane: `src/Web/StellaOps.Web/tests/e2e/unified-search-contextual-suggestions.live.e2e.spec.ts` now preflights corpus readiness, validates suggestion viability, executes every surfaced Doctor suggestion, asserts grounded-or-clarify answer states, verifies follow-up chips after result open, and verifies Ask-AdvisoryAI inherits the live query context
+- Verified local corpus baseline on 2026-03-07 after `advisoryai sources prepare`: `documentCount = 470`, `chunkCount = 9050`, `apiOperationCount = 2190`, `doctorProjectionCount = 8`
 - Other routes still rely on deterministic mock-backed Playwright coverage until their ingestion parity is explicitly verified

 Or use the full CI testing stack:
--- a/docs/modules/ui/search-zero-learning-primary-entry.md
+++ b/docs/modules/ui/search-zero-learning-primary-entry.md
@@ -61,6 +61,7 @@
 - Knowledge/domain emptiness should be detectable so the UI can suppress invalid chips.
 - Empty-state contextual chips and page-owned common-question chips should preflight through the backend viability endpoint before they render.
 - Live Playwright coverage must assert that every surfaced suggestion returns visible results.
+- A service health check alone is not enough. On 2026-03-07, `http://127.1.0.44/health` returned `200` while the live knowledge rebuild returned `documentCount=0`; the product still surfaced dead chips. Corpus readiness is the gate, not process liveness.

 ## Phase map
 - Phase 1: FE primary-entry consolidation and removal of explicit search controls.
@@ -68,3 +69,4 @@
 - Phase 3: FE consumption of overflow results and executable suggestion contracts.
  - Implemented on 2026-03-07: backend `contextAnswer` is now preferred over frontend heuristics, overflow renders as a secondary result section, and suggestion viability preflight suppresses dead chips before they are shown.
 - Phase 4: Live Playwright reliability matrix with corpus preflight and chip-success guarantees.
+  - Implemented on 2026-03-07: the live suite now rebuilds the active corpus, fails fast on empty knowledge projections, iterates every surfaced Doctor suggestion, and verifies Ask-AdvisoryAI inherits the live search context.