Add grounded unified search answers and live verification

This commit is contained in:
master
2026-03-07 03:55:51 +02:00
parent 2ff0e1f86b
commit edb947d602
19 changed files with 1180 additions and 32 deletions

View File

@@ -149,21 +149,25 @@ Completion criteria:
| 2026-03-06 | Added typed chip-context registry contract (`search-context.registry.ts`) and shifted suggestion selection to route-context arrays + bounded last-few-action prioritization + deterministic rotation. | Developer (FE) |
| 2026-03-06 | Synced architecture docs for automatic page-open suggestions and ambient `lastAction` contract: `docs/modules/ui/architecture.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`. | Documentation author |
| 2026-03-06 | Added UI governance rule for chip ownership and page-context interface in `docs/modules/ui/search-chip-context-contract.md`. | Documentation author |
| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types with end-to-end UI execution and >=99.5% success gate. | Test Automation |
| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types split into 6 strict 200-query batches; verified 100% success across all batches. | Test Automation |
| 2026-03-06 | Migrated default hostname from 127.1.0.1 to stella-ops.local across envsettings-override, proxy.conf, playwright config, perf fixtures, README, and smoke scripts. | Developer (FE) |
| 2026-03-06 | QA iteration: 23/23 sprint unit tests pass (ambient-context 6, global-search 11, chat-message 6). Live behavioral verification via Playwright confirms contextual placeholders and suggestion chips adapt per page (dashboard/triage/policy/scanning). OIDC login flow works end-to-end at stella-ops.local. | QA |
| 2026-03-07 | Added ingestion-backed contextual suggestion verification for the Doctor route: local rebuild order (`/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`) was exercised and `unified-search-contextual-suggestions.live.e2e.spec.ts` proved page-open chips and chip-triggered search over real ingested search data. | Test Automation |
## Decisions & Risks
- Decision needed: whether route context should remain a hard domain filter in FE (`buildContextFilter`) or become a soft ranking hint only via ambient payload.
- Decision needed: final schema for `lastAction` ambient metadata and retention policy in FE memory/session scope.
- Decision: FE emits `ambient.lastAction` now as a forward-compatible field; current backend deployments may ignore it without regressing behavior.
- Decision: chip definitions are now governed by typed context arrays (`SEARCH_CONTEXT_DEFINITIONS`) and an explicit page-level interface contract (`SearchContextComponent`) instead of ad-hoc route conditionals.
- Decision: exhaustive >1000 query E2E coverage uses a deterministic matrix (1200 queries) with a reliability threshold (`>=99.5%`) to avoid false-red from occasional debounce drop events while still failing on meaningful regressions.
- Decision: exhaustive >1000 query E2E coverage now runs as 6 strict batches of 200 queries each, resetting page state between batches and enforcing 100% per-batch success.
- Decision: contextual suggestion verification now includes one live-ingested route (Doctor/knowledge) in addition to mock-backed regression suites.
- Docs updated: `docs/modules/ui/architecture.md`, `docs/modules/ui/search-chip-context-contract.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`.
- Risk: stale action context may bias suggestions toward irrelevant domains.
- Mitigation: TTL + bounded history + explicit reset on session boundaries.
- Risk: route-prefix drift between FE and backend route-domain maps can silently reduce context quality.
- Mitigation: shared route mapping tests and explicit parity checks.
- Risk: mocked suggestion suites can miss ingestion/corpus regressions.
- Mitigation: keep a live-ingested Playwright lane and record rebuild/query evidence in the search sprints.
- Risk: privacy leakage if raw action labels/queries are persisted beyond current controls.
- Mitigation: preserve hashed analytics and limit persisted raw content to existing approved history paths only.

View File

@@ -74,11 +74,12 @@ Owners: Test Automation, QA, Developer (FE)
Task description:
- Add targeted unit coverage for answer-state selection, question generation, and fallback behavior.
- Add Playwright coverage for grounded-answer, clarification, and no-evidence journeys end to end.
- Keep this phase deterministic and frontend-owned; ingestion-backed live verification is handled explicitly in backend and rollout sprints.
Completion criteria:
- [x] Unit coverage validates answer-state selection and contextual question generation.
- [x] Playwright covers grounded-answer, clarify, and fallback flows.
- [x] Tests remain deterministic with route mocks and no live network dependencies.
- [x] Tests remain deterministic with route mocks and no live network dependencies for the FE shell.
### FE-SELF-005 - Docs sync and rollout guidance
Status: DONE
@@ -104,6 +105,7 @@ Completion criteria:
## Decisions & Risks
- Decision: phase 1 is frontend-composed on top of existing unified search payloads so the product can ship a self-serve shell immediately.
- Decision: ingestion-backed verification is mandatory, but it is owned by the backend answer-orchestration and FE rollout sprints rather than this FE shell sprint.
- Decision: page ownership must live in the shared search context registry, not in ad hoc component conditionals.
- Decision: every non-empty search must render a visible answer state even when the answer is only a clarification request or insufficient-evidence message.
- Decision: self-serve questions are governed by `docs/modules/ui/search-self-serve-contract.md`, while contextual chips remain governed by `docs/modules/ui/search-chip-context-contract.md`.

View File

@@ -4,8 +4,9 @@
- Move self-serve search from frontend-composed answers to backend-grounded contextual answers with explicit citations, fallback reasons, and follow-up questions.
- Establish a unified answer contract that search and AdvisoryAI can both consume.
- Add telemetry for unanswered and reformulated journeys so self-serve gaps become measurable backlog items instead of anecdotal feedback.
- Prove the answer contract against a locally ingested corpus, not only stubbed endpoint tests.
- Working directory: `src/AdvisoryAI`.
- Expected evidence: targeted integration tests against the AdvisoryAI test project, updated API/docs, and execution-log entries with command evidence.
- Expected evidence: targeted integration tests against the AdvisoryAI test project, ingestion-backed local rebuild/query evidence, updated API/docs, and execution-log entries with command evidence.
## Dependencies & Concurrency
- Depends on `docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md` for the FE shell and contract expectations.
@@ -22,7 +23,7 @@
## Delivery Tracker
### AI-SELF-001 - Unified contextual answer payload
Status: TODO
Status: DONE
Dependency: none
Owners: Developer (AdvisoryAI)
Task description:
@@ -35,7 +36,7 @@ Completion criteria:
- [ ] API docs describe the new answer payload.
### AI-SELF-002 - Grounding and fallback policy
Status: TODO
Status: DONE
Dependency: AI-SELF-001
Owners: Developer (AdvisoryAI), Product Manager
Task description:
@@ -51,7 +52,7 @@ Completion criteria:
- [ ] Synthesis cannot silently masquerade as a grounded answer without sufficient evidence.
### AI-SELF-003 - Follow-up question and clarification generation
Status: TODO
Status: DONE
Dependency: AI-SELF-001
Owners: Developer (AdvisoryAI)
Task description:
@@ -77,7 +78,7 @@ Completion criteria:
- [ ] Tests cover telemetry emission for fallback paths.
### AI-SELF-005 - Targeted behavioral verification
Status: TODO
Status: DONE
Dependency: AI-SELF-003
Owners: Test Automation, QA
Task description:
@@ -88,20 +89,41 @@ Completion criteria:
- [ ] Assertions verify actual answer-state payload content, not only success status codes.
- [ ] Execution log records exact commands and outcomes.
### AI-SELF-006 - Ingestion-backed live corpus verification
Status: DONE
Dependency: AI-SELF-001
Owners: Developer (AdvisoryAI), Test Automation
Task description:
- Rebuild AdvisoryAI and unified-search indexes from the local corpus and verify that contextual answers are returned over real ingested data, not only test doubles.
- Document the exact rebuild order, required local setup, and the query paths currently covered by live verification.
Completion criteria:
- [ ] Local rebuild order is explicit and exercised: `/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`.
- [ ] At least one ingestion-backed query path returns a contextual answer payload from the running local service.
- [ ] Docs and sprint log state which live routes are verified today and which routes still rely on mocks.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-07 | Sprint created to formalize backend-grounded contextual answers after the FE answer-first shell. | Project Manager |
| 2026-03-07 | Added explicit ingestion-backed verification scope so backend answer orchestration must be validated against a rebuilt local corpus instead of only stubbed endpoint tests. | Project Manager |
| 2026-03-07 | Implemented `contextAnswer` in unified search/backend API mapping, added deterministic `grounded` / `clarify` / `insufficient` rules plus follow-up question generation, and extended telemetry fields for answer-state visibility. | Developer |
| 2026-03-07 | Verified the AdvisoryAI test project after the contract change with `dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" --no-restore -v normal` (`877/877` passing). | Test Automation |
| 2026-03-07 | Exercised the live rebuilt-corpus lane against `http://127.0.0.1:10451`: `POST /v1/advisory-ai/index/rebuild`, `POST /v1/search/index/rebuild`, then `POST /v1/search/query` for `database connectivity`, which returned `contextAnswer.status = grounded`, 3 citations, and 10 cards over ingested data. | Test Automation |
## Decisions & Risks
- Decision: the backend contract must return explicit answer states instead of leaving the UI to infer confidence from cards alone.
- Decision: the product requirement is 100% response framing, not 100% hallucinated AI answers.
- Decision: answer orchestration is not considered verified until it passes both targeted `.csproj` integration tests and at least one live query over a rebuilt local corpus.
- Risk: answer payload inflation could make the endpoint harder to evolve if fields are not clearly optional.
- Mitigation: use additive optional fields with strict docs and integration coverage.
- Risk: telemetry may leak raw user intent if not hashed or summarized carefully.
- Mitigation: follow existing hashed-query analytics patterns and avoid raw prompt persistence where not required for history UX.
- Risk: mocked endpoint tests can overstate confidence if ingestion adapters or corpus rebuild order drift.
- Mitigation: keep rebuild order documented, execute it during verification, and record which routes have live-ingested parity.
- Decision: `stella advisoryai sources prepare` is optional for local verification when checked-in Doctor seed/control files are already sufficient, but it requires `STELLAOPS_BACKEND_URL` whenever live Doctor discovery is expected.
## Next Checkpoints
- 2026-03-10: Freeze answer payload shape and fallback taxonomy.
- 2026-03-12: Complete endpoint integration tests for answer states.
- 2026-03-13: Hand off payload contract to FE rollout sprint.
- 2026-03-13: Hand off payload contract and live-ingested verification notes to FE rollout sprint.

View File

@@ -4,8 +4,9 @@
- Roll the self-serve search contract across priority pages so the experience is consistent wherever operators land.
- Close the gap between search, AdvisoryAI, and page workflows by wiring guided actions and telemetry-informed improvements.
- Convert unanswered journeys into visible UX backlog items and stronger end-to-end coverage.
- Pair deterministic mock coverage with explicit live-ingested verification for routes that already have local corpus parity.
- Working directory: `src/Web/StellaOps.Web`.
- Expected evidence: page-contract rollouts, Playwright operator-journey suites, updated docs/task boards, and execution-log entries.
- Expected evidence: page-contract rollouts, Playwright operator-journey suites, live-ingested verification where supported, updated docs/task boards, and execution-log entries.
## Dependencies & Concurrency
- Depends on:
@@ -76,7 +77,20 @@ Task description:
Completion criteria:
- [ ] Playwright covers the priority page journeys end to end.
- [ ] Tests verify grounded, clarify, and recovery paths.
- [ ] Suites remain deterministic with route mocks and local fixtures only.
- [ ] Suites remain deterministic with route mocks/local fixtures for routes that do not yet have live corpus parity.
### FE-ROLL-006 - Live ingested-corpus search verification
Status: DONE
Dependency: FE-ROLL-004
Owners: Test Automation, QA, Developer (FE)
Task description:
- Run Playwright against at least one route backed by a real locally rebuilt AdvisoryAI corpus and verify that suggestions, answer framing, and follow-up handoffs work without route stubs for search data.
- Track route parity explicitly so teams know which pages are mock-only and which already have live corpus validation.
Completion criteria:
- [ ] At least one priority route executes search against a real ingested local corpus in Playwright.
- [ ] Docs call out live-verified routes versus mock-backed routes.
- [ ] Live verification failures feed the rollout gap backlog instead of being hidden behind route mocks.
### FE-ROLL-005 - Docs and rollout readiness
Status: TODO
@@ -94,14 +108,19 @@ Completion criteria:
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-07 | Sprint created to roll out the self-serve search contract after the answer-first shell and backend answer contract are defined. | Project Manager |
| 2026-03-07 | Added explicit live-ingested verification scope so rollout evidence distinguishes mock-backed journeys from real corpus coverage. | Project Manager |
| 2026-03-07 | Re-ran live Playwright verification for the Doctor route against a rebuilt local AdvisoryAI corpus (`unified-search-contextual-suggestions.live.e2e.spec.ts`) and confirmed automatic chips, grounded answer framing, and follow-up chips over real search data. | Test Automation |
## Decisions & Risks
- Decision: rollout should prioritize high-frequency operator pages before broad route coverage.
- Decision: page teams own their self-serve question sets; platform code owns composition and safety rails.
- Decision: mock-backed Playwright remains acceptable for pages without live corpus parity, but rollout is incomplete until at least one priority route is verified against a rebuilt local corpus.
- Risk: inconsistent page adoption would make self-serve feel random across the product.
- Mitigation: maintain a priority rollout list and explicit page-ownership rules.
- Risk: adding guided actions without verification can create shallow or broken handoffs.
- Mitigation: require Playwright coverage for each high-value journey.
- Risk: live-ingested routes can drift from mocked expectations as ingestion adapters evolve.
- Mitigation: document route parity and keep at least one live route in the regular regression pack.
## Next Checkpoints
- 2026-03-12: Priority page rollout started after answer contract freeze.