Add grounded unified search answers and live verification
This commit is contained in:
@@ -149,21 +149,25 @@ Completion criteria:
|
||||
| 2026-03-06 | Added typed chip-context registry contract (`search-context.registry.ts`) and shifted suggestion selection to route-context arrays + bounded last-few-action prioritization + deterministic rotation. | Developer (FE) |
|
||||
| 2026-03-06 | Synced architecture docs for automatic page-open suggestions and ambient `lastAction` contract: `docs/modules/ui/architecture.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`. | Documentation author |
|
||||
| 2026-03-06 | Added UI governance rule for chip ownership and page-context interface in `docs/modules/ui/search-chip-context-contract.md`. | Documentation author |
|
||||
| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types with end-to-end UI execution and >=99.5% success gate. | Test Automation |
|
||||
| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types split into 6 strict 200-query batches; verified 100% success across all batches. | Test Automation |
|
||||
| 2026-03-06 | Migrated default hostname from 127.1.0.1 to stella-ops.local across envsettings-override, proxy.conf, playwright config, perf fixtures, README, and smoke scripts. | Developer (FE) |
|
||||
| 2026-03-06 | QA iteration: 23/23 sprint unit tests pass (ambient-context 6, global-search 11, chat-message 6). Live behavioral verification via Playwright confirms contextual placeholders and suggestion chips adapt per page (dashboard/triage/policy/scanning). OIDC login flow works end-to-end at stella-ops.local. | QA |
|
||||
| 2026-03-07 | Added ingestion-backed contextual suggestion verification for the Doctor route: local rebuild order (`/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`) was exercised and `unified-search-contextual-suggestions.live.e2e.spec.ts` proved page-open chips and chip-triggered search over real ingested search data. | Test Automation |
|
||||
|
||||
## Decisions & Risks
|
||||
- Decision needed: whether route context should remain a hard domain filter in FE (`buildContextFilter`) or become a soft ranking hint only via ambient payload.
|
||||
- Decision needed: final schema for `lastAction` ambient metadata and retention policy in FE memory/session scope.
|
||||
- Decision: FE emits `ambient.lastAction` now as a forward-compatible field; current backend deployments may ignore it without regressing behavior.
|
||||
- Decision: chip definitions are now governed by typed context arrays (`SEARCH_CONTEXT_DEFINITIONS`) and an explicit page-level interface contract (`SearchContextComponent`) instead of ad-hoc route conditionals.
|
||||
- Decision: exhaustive >1000 query E2E coverage uses a deterministic matrix (1200 queries) with a reliability threshold (`>=99.5%`) to avoid false-red from occasional debounce drop events while still failing on meaningful regressions.
|
||||
- Decision: exhaustive >1000 query E2E coverage now runs as 6 strict batches of 200 queries each, resetting page state between batches and enforcing 100% per-batch success.
|
||||
- Decision: contextual suggestion verification now includes one live-ingested route (Doctor/knowledge) in addition to mock-backed regression suites.
|
||||
- Docs updated: `docs/modules/ui/architecture.md`, `docs/modules/ui/search-chip-context-contract.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`.
|
||||
- Risk: stale action context may bias suggestions toward irrelevant domains.
|
||||
- Mitigation: TTL + bounded history + explicit reset on session boundaries.
|
||||
- Risk: route-prefix drift between FE and backend route-domain maps can silently reduce context quality.
|
||||
- Mitigation: shared route mapping tests and explicit parity checks.
|
||||
- Risk: mocked suggestion suites can miss ingestion/corpus regressions.
|
||||
- Mitigation: keep a live-ingested Playwright lane and record rebuild/query evidence in the search sprints.
|
||||
- Risk: privacy leakage if raw action labels/queries are persisted beyond current controls.
|
||||
- Mitigation: preserve hashed analytics and limit persisted raw content to existing approved history paths only.
|
||||
|
||||
|
||||
@@ -74,11 +74,12 @@ Owners: Test Automation, QA, Developer (FE)
|
||||
Task description:
|
||||
- Add targeted unit coverage for answer-state selection, question generation, and fallback behavior.
|
||||
- Add Playwright coverage for grounded-answer, clarification, and no-evidence journeys end to end.
|
||||
- Keep this phase deterministic and frontend-owned; ingestion-backed live verification is handled explicitly in backend and rollout sprints.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Unit coverage validates answer-state selection and contextual question generation.
|
||||
- [x] Playwright covers grounded-answer, clarify, and fallback flows.
|
||||
- [x] Tests remain deterministic with route mocks and no live network dependencies.
|
||||
- [x] Tests remain deterministic with route mocks and no live network dependencies for the FE shell.
|
||||
|
||||
### FE-SELF-005 - Docs sync and rollout guidance
|
||||
Status: DONE
|
||||
@@ -104,6 +105,7 @@ Completion criteria:
|
||||
|
||||
## Decisions & Risks
|
||||
- Decision: phase 1 is frontend-composed on top of existing unified search payloads so the product can ship a self-serve shell immediately.
|
||||
- Decision: ingestion-backed verification is mandatory, but it is owned by the backend answer-orchestration and FE rollout sprints rather than this FE shell sprint.
|
||||
- Decision: page ownership must live in the shared search context registry, not in ad hoc component conditionals.
|
||||
- Decision: every non-empty search must render a visible answer state even when the answer is only a clarification request or insufficient-evidence message.
|
||||
- Decision: self-serve questions are governed by `docs/modules/ui/search-self-serve-contract.md`, while contextual chips remain governed by `docs/modules/ui/search-chip-context-contract.md`.
|
||||
|
||||
@@ -4,8 +4,9 @@
|
||||
- Move self-serve search from frontend-composed answers to backend-grounded contextual answers with explicit citations, fallback reasons, and follow-up questions.
|
||||
- Establish a unified answer contract that search and AdvisoryAI can both consume.
|
||||
- Add telemetry for unanswered and reformulated journeys so self-serve gaps become measurable backlog items instead of anecdotal feedback.
|
||||
- Prove the answer contract against a locally ingested corpus, not only stubbed endpoint tests.
|
||||
- Working directory: `src/AdvisoryAI`.
|
||||
- Expected evidence: targeted integration tests against the AdvisoryAI test project, updated API/docs, and execution-log entries with command evidence.
|
||||
- Expected evidence: targeted integration tests against the AdvisoryAI test project, ingestion-backed local rebuild/query evidence, updated API/docs, and execution-log entries with command evidence.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Depends on `docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md` for the FE shell and contract expectations.
|
||||
@@ -22,7 +23,7 @@
|
||||
## Delivery Tracker
|
||||
|
||||
### AI-SELF-001 - Unified contextual answer payload
|
||||
Status: TODO
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: Developer (AdvisoryAI)
|
||||
Task description:
|
||||
@@ -35,7 +36,7 @@ Completion criteria:
|
||||
- [ ] API docs describe the new answer payload.
|
||||
|
||||
### AI-SELF-002 - Grounding and fallback policy
|
||||
Status: TODO
|
||||
Status: DONE
|
||||
Dependency: AI-SELF-001
|
||||
Owners: Developer (AdvisoryAI), Product Manager
|
||||
Task description:
|
||||
@@ -51,7 +52,7 @@ Completion criteria:
|
||||
- [ ] Synthesis cannot silently masquerade as a grounded answer without sufficient evidence.
|
||||
|
||||
### AI-SELF-003 - Follow-up question and clarification generation
|
||||
Status: TODO
|
||||
Status: DONE
|
||||
Dependency: AI-SELF-001
|
||||
Owners: Developer (AdvisoryAI)
|
||||
Task description:
|
||||
@@ -77,7 +78,7 @@ Completion criteria:
|
||||
- [ ] Tests cover telemetry emission for fallback paths.
|
||||
|
||||
### AI-SELF-005 - Targeted behavioral verification
|
||||
Status: TODO
|
||||
Status: DONE
|
||||
Dependency: AI-SELF-003
|
||||
Owners: Test Automation, QA
|
||||
Task description:
|
||||
@@ -88,20 +89,41 @@ Completion criteria:
|
||||
- [ ] Assertions verify actual answer-state payload content, not only success status codes.
|
||||
- [ ] Execution log records exact commands and outcomes.
|
||||
|
||||
### AI-SELF-006 - Ingestion-backed live corpus verification
|
||||
Status: DONE
|
||||
Dependency: AI-SELF-001
|
||||
Owners: Developer (AdvisoryAI), Test Automation
|
||||
Task description:
|
||||
- Rebuild AdvisoryAI and unified-search indexes from the local corpus and verify that contextual answers are returned over real ingested data, not only test doubles.
|
||||
- Document the exact rebuild order, required local setup, and the query paths currently covered by live verification.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Local rebuild order is explicit and exercised: `/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`.
|
||||
- [ ] At least one ingestion-backed query path returns a contextual answer payload from the running local service.
|
||||
- [ ] Docs and sprint log state which live routes are verified today and which routes still rely on mocks.
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-03-07 | Sprint created to formalize backend-grounded contextual answers after the FE answer-first shell. | Project Manager |
|
||||
| 2026-03-07 | Added explicit ingestion-backed verification scope so backend answer orchestration must be validated against a rebuilt local corpus instead of only stubbed endpoint tests. | Project Manager |
|
||||
| 2026-03-07 | Implemented `contextAnswer` in unified search/backend API mapping, added deterministic `grounded` / `clarify` / `insufficient` rules plus follow-up question generation, and extended telemetry fields for answer-state visibility. | Developer |
|
||||
| 2026-03-07 | Verified the AdvisoryAI test project after the contract change with `dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" --no-restore -v normal` (`877/877` passing). | Test Automation |
|
||||
| 2026-03-07 | Exercised the live rebuilt-corpus lane against `http://127.0.0.1:10451`: `POST /v1/advisory-ai/index/rebuild`, `POST /v1/search/index/rebuild`, then `POST /v1/search/query` for `database connectivity`, which returned `contextAnswer.status = grounded`, 3 citations, and 10 cards over ingested data. | Test Automation |
|
||||
|
||||
## Decisions & Risks
|
||||
- Decision: the backend contract must return explicit answer states instead of leaving the UI to infer confidence from cards alone.
|
||||
- Decision: the product requirement is 100% response framing, not 100% hallucinated AI answers.
|
||||
- Decision: answer orchestration is not considered verified until it passes both targeted `.csproj` integration tests and at least one live query over a rebuilt local corpus.
|
||||
- Risk: answer payload inflation could make the endpoint harder to evolve if fields are not clearly optional.
|
||||
- Mitigation: use additive optional fields with strict docs and integration coverage.
|
||||
- Risk: telemetry may leak raw user intent if not hashed or summarized carefully.
|
||||
- Mitigation: follow existing hashed-query analytics patterns and avoid raw prompt persistence where not required for history UX.
|
||||
- Risk: mocked endpoint tests can overstate confidence if ingestion adapters or corpus rebuild order drift.
|
||||
- Mitigation: keep rebuild order documented, execute it during verification, and record which routes have live-ingested parity.
|
||||
- Decision: `stella advisoryai sources prepare` is optional for local verification when checked-in Doctor seed/control files are already sufficient, but it requires `STELLAOPS_BACKEND_URL` whenever live Doctor discovery is expected.
|
||||
|
||||
## Next Checkpoints
|
||||
- 2026-03-10: Freeze answer payload shape and fallback taxonomy.
|
||||
- 2026-03-12: Complete endpoint integration tests for answer states.
|
||||
- 2026-03-13: Hand off payload contract to FE rollout sprint.
|
||||
- 2026-03-13: Hand off payload contract and live-ingested verification notes to FE rollout sprint.
|
||||
|
||||
@@ -4,8 +4,9 @@
|
||||
- Roll the self-serve search contract across priority pages so the experience is consistent wherever operators land.
|
||||
- Close the gap between search, AdvisoryAI, and page workflows by wiring guided actions and telemetry-informed improvements.
|
||||
- Convert unanswered journeys into visible UX backlog items and stronger end-to-end coverage.
|
||||
- Pair deterministic mock coverage with explicit live-ingested verification for routes that already have local corpus parity.
|
||||
- Working directory: `src/Web/StellaOps.Web`.
|
||||
- Expected evidence: page-contract rollouts, Playwright operator-journey suites, updated docs/task boards, and execution-log entries.
|
||||
- Expected evidence: page-contract rollouts, Playwright operator-journey suites, live-ingested verification where supported, updated docs/task boards, and execution-log entries.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Depends on:
|
||||
@@ -76,7 +77,20 @@ Task description:
|
||||
Completion criteria:
|
||||
- [ ] Playwright covers the priority page journeys end to end.
|
||||
- [ ] Tests verify grounded, clarify, and recovery paths.
|
||||
- [ ] Suites remain deterministic with route mocks and local fixtures only.
|
||||
- [ ] Suites remain deterministic with route mocks/local fixtures for routes that do not yet have live corpus parity.
|
||||
|
||||
### FE-ROLL-006 - Live ingested-corpus search verification
|
||||
Status: DONE
|
||||
Dependency: FE-ROLL-004
|
||||
Owners: Test Automation, QA, Developer (FE)
|
||||
Task description:
|
||||
- Run Playwright against at least one route backed by a real locally rebuilt AdvisoryAI corpus and verify that suggestions, answer framing, and follow-up handoffs work without route stubs for search data.
|
||||
- Track route parity explicitly so teams know which pages are mock-only and which already have live corpus validation.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] At least one priority route executes search against a real ingested local corpus in Playwright.
|
||||
- [ ] Docs call out live-verified routes versus mock-backed routes.
|
||||
- [ ] Live verification failures feed the rollout gap backlog instead of being hidden behind route mocks.
|
||||
|
||||
### FE-ROLL-005 - Docs and rollout readiness
|
||||
Status: TODO
|
||||
@@ -94,14 +108,19 @@ Completion criteria:
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-03-07 | Sprint created to roll out the self-serve search contract after the answer-first shell and backend answer contract are defined. | Project Manager |
|
||||
| 2026-03-07 | Added explicit live-ingested verification scope so rollout evidence distinguishes mock-backed journeys from real corpus coverage. | Project Manager |
|
||||
| 2026-03-07 | Re-ran live Playwright verification for the Doctor route against a rebuilt local AdvisoryAI corpus (`unified-search-contextual-suggestions.live.e2e.spec.ts`) and confirmed automatic chips, grounded answer framing, and follow-up chips over real search data. | Test Automation |
|
||||
|
||||
## Decisions & Risks
|
||||
- Decision: rollout should prioritize high-frequency operator pages before broad route coverage.
|
||||
- Decision: page teams own their self-serve question sets; platform code owns composition and safety rails.
|
||||
- Decision: mock-backed Playwright remains acceptable for pages without live corpus parity, but rollout is incomplete until at least one priority route is verified against a rebuilt local corpus.
|
||||
- Risk: inconsistent page adoption would make self-serve feel random across the product.
|
||||
- Mitigation: maintain a priority rollout list and explicit page-ownership rules.
|
||||
- Risk: adding guided actions without verification can create shallow or broken handoffs.
|
||||
- Mitigation: require Playwright coverage for each high-value journey.
|
||||
- Risk: live-ingested routes can drift from mocked expectations as ingestion adapters evolve.
|
||||
- Mitigation: document route parity and keep at least one live route in the regular regression pack.
|
||||
|
||||
## Next Checkpoints
|
||||
- 2026-03-12: Priority page rollout started after answer contract freeze.
|
||||
|
||||
Reference in New Issue
Block a user