Add grounded unified search answers and live verification

2026-03-07 03:55:51 +02:00
parent 2ff0e1f86b
commit edb947d602
19 changed files with 1180 additions and 32 deletions
--- a/docs/API_CLI_REFERENCE.md
+++ b/docs/API_CLI_REFERENCE.md
@@ -572,6 +572,10 @@ stella advisoryai index rebuild --json
 Generate deterministic AdvisoryAI Knowledge Search (AKS) source artifacts used by index rebuild.
 Doctor controls output is enriched from configured seed plus locally discovered Doctor checks (when Doctor engine services are available), providing fallback metadata for AdvisoryAI ingestion.

+Requirements:
+- In a source checkout, do not assume `stella` is already installed on `PATH`; build or run it from source first.
+- Set `STELLAOPS_BACKEND_URL` (or the equivalent CLI config file value) when the command needs live Doctor check discovery.
+
 ### Synopsis

 ```bash
@@ -594,6 +598,7 @@ stella advisoryai sources prepare [options]
 ### Examples

 ```bash
+export STELLAOPS_BACKEND_URL="http://127.0.0.1:10451"
 stella advisoryai sources prepare --json
 stella advisoryai sources prepare --repo-root . --openapi-output devops/compose/openapi_current.json --overwrite
 ```
--- a/docs/implplan/SPRINT_20260306_001_Web_contextual_search_suggestions.md
+++ b/docs/implplan/SPRINT_20260306_001_Web_contextual_search_suggestions.md
@@ -149,21 +149,25 @@ Completion criteria:
 | 2026-03-06 | Added typed chip-context registry contract (`search-context.registry.ts`) and shifted suggestion selection to route-context arrays + bounded last-few-action prioritization + deterministic rotation. | Developer (FE) |
 | 2026-03-06 | Synced architecture docs for automatic page-open suggestions and ambient `lastAction` contract: `docs/modules/ui/architecture.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`. | Documentation author |
 | 2026-03-06 | Added UI governance rule for chip ownership and page-context interface in `docs/modules/ui/search-chip-context-contract.md`. | Documentation author |
-| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types with end-to-end UI execution and >=99.5% success gate. | Test Automation |
+| 2026-03-06 | Added exhaustive Playwright query-matrix suite (`tests/e2e/unified-search-exhaustive-matrix.e2e.spec.ts`) generating 1200 deterministic query types split into 6 strict 200-query batches; verified 100% success across all batches. | Test Automation |
 | 2026-03-06 | Migrated default hostname from 127.1.0.1 to stella-ops.local across envsettings-override, proxy.conf, playwright config, perf fixtures, README, and smoke scripts. | Developer (FE) |
 | 2026-03-06 | QA iteration: 23/23 sprint unit tests pass (ambient-context 6, global-search 11, chat-message 6). Live behavioral verification via Playwright confirms contextual placeholders and suggestion chips adapt per page (dashboard/triage/policy/scanning). OIDC login flow works end-to-end at stella-ops.local. | QA |
+| 2026-03-07 | Added ingestion-backed contextual suggestion verification for the Doctor route: local rebuild order (`/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`) was exercised and `unified-search-contextual-suggestions.live.e2e.spec.ts` proved page-open chips and chip-triggered search over real ingested search data. | Test Automation |

 ## Decisions & Risks
 - Decision needed: whether route context should remain a hard domain filter in FE (`buildContextFilter`) or become a soft ranking hint only via ambient payload.
 - Decision needed: final schema for `lastAction` ambient metadata and retention policy in FE memory/session scope.
 - Decision: FE emits `ambient.lastAction` now as a forward-compatible field; current backend deployments may ignore it without regressing behavior.
 - Decision: chip definitions are now governed by typed context arrays (`SEARCH_CONTEXT_DEFINITIONS`) and an explicit page-level interface contract (`SearchContextComponent`) instead of ad-hoc route conditionals.
- Decision: exhaustive >1000 query E2E coverage uses a deterministic matrix (1200 queries) with a reliability threshold (`>=99.5%`) to avoid false-red from occasional debounce drop events while still failing on meaningful regressions.
+- Decision: exhaustive >1000 query E2E coverage now runs as 6 strict batches of 200 queries each, resetting page state between batches and enforcing 100% per-batch success.
+- Decision: contextual suggestion verification now includes one live-ingested route (Doctor/knowledge) in addition to mock-backed regression suites.
 - Docs updated: `docs/modules/ui/architecture.md`, `docs/modules/ui/search-chip-context-contract.md`, `docs/modules/advisory-ai/knowledge-search.md`, `docs/modules/advisory-ai/unified-search-architecture.md`.
 - Risk: stale action context may bias suggestions toward irrelevant domains.
 - Mitigation: TTL + bounded history + explicit reset on session boundaries.
 - Risk: route-prefix drift between FE and backend route-domain maps can silently reduce context quality.
 - Mitigation: shared route mapping tests and explicit parity checks.
+- Risk: mocked suggestion suites can miss ingestion/corpus regressions.
+- Mitigation: keep a live-ingested Playwright lane and record rebuild/query evidence in the search sprints.
 - Risk: privacy leakage if raw action labels/queries are persisted beyond current controls.
 - Mitigation: preserve hashed analytics and limit persisted raw content to existing approved history paths only.

--- a/docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md
+++ b/docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md
@@ -74,11 +74,12 @@ Owners: Test Automation, QA, Developer (FE)
 Task description:
 - Add targeted unit coverage for answer-state selection, question generation, and fallback behavior.
 - Add Playwright coverage for grounded-answer, clarification, and no-evidence journeys end to end.
+- Keep this phase deterministic and frontend-owned; ingestion-backed live verification is handled explicitly in backend and rollout sprints.

 Completion criteria:
 - [x] Unit coverage validates answer-state selection and contextual question generation.
 - [x] Playwright covers grounded-answer, clarify, and fallback flows.
- [x] Tests remain deterministic with route mocks and no live network dependencies.
+- [x] Tests remain deterministic with route mocks and no live network dependencies for the FE shell.

 ### FE-SELF-005 - Docs sync and rollout guidance
 Status: DONE
@@ -104,6 +105,7 @@ Completion criteria:

 ## Decisions & Risks
 - Decision: phase 1 is frontend-composed on top of existing unified search payloads so the product can ship a self-serve shell immediately.
+- Decision: ingestion-backed verification is mandatory, but it is owned by the backend answer-orchestration and FE rollout sprints rather than this FE shell sprint.
 - Decision: page ownership must live in the shared search context registry, not in ad hoc component conditionals.
 - Decision: every non-empty search must render a visible answer state even when the answer is only a clarification request or insufficient-evidence message.
 - Decision: self-serve questions are governed by `docs/modules/ui/search-self-serve-contract.md`, while contextual chips remain governed by `docs/modules/ui/search-chip-context-contract.md`.
--- a/docs/implplan/SPRINT_20260307_005_AdvisoryAI_grounded_search_answer_orchestration.md
+++ b/docs/implplan/SPRINT_20260307_005_AdvisoryAI_grounded_search_answer_orchestration.md
@@ -4,8 +4,9 @@
 - Move self-serve search from frontend-composed answers to backend-grounded contextual answers with explicit citations, fallback reasons, and follow-up questions.
 - Establish a unified answer contract that search and AdvisoryAI can both consume.
 - Add telemetry for unanswered and reformulated journeys so self-serve gaps become measurable backlog items instead of anecdotal feedback.
+- Prove the answer contract against a locally ingested corpus, not only stubbed endpoint tests.
 - Working directory: `src/AdvisoryAI`.
- Expected evidence: targeted integration tests against the AdvisoryAI test project, updated API/docs, and execution-log entries with command evidence.
+- Expected evidence: targeted integration tests against the AdvisoryAI test project, ingestion-backed local rebuild/query evidence, updated API/docs, and execution-log entries with command evidence.

 ## Dependencies & Concurrency
 - Depends on `docs/implplan/SPRINT_20260307_004_FE_self_serve_search_answer_first.md` for the FE shell and contract expectations.
@@ -22,7 +23,7 @@
 ## Delivery Tracker

 ### AI-SELF-001 - Unified contextual answer payload
-Status: TODO
+Status: DONE
 Dependency: none
 Owners: Developer (AdvisoryAI)
 Task description:
@@ -35,7 +36,7 @@ Completion criteria:
 - [ ] API docs describe the new answer payload.

 ### AI-SELF-002 - Grounding and fallback policy
-Status: TODO
+Status: DONE
 Dependency: AI-SELF-001
 Owners: Developer (AdvisoryAI), Product Manager
 Task description:
@@ -51,7 +52,7 @@ Completion criteria:
 - [ ] Synthesis cannot silently masquerade as a grounded answer without sufficient evidence.

 ### AI-SELF-003 - Follow-up question and clarification generation
-Status: TODO
+Status: DONE
 Dependency: AI-SELF-001
 Owners: Developer (AdvisoryAI)
 Task description:
@@ -77,7 +78,7 @@ Completion criteria:
 - [ ] Tests cover telemetry emission for fallback paths.

 ### AI-SELF-005 - Targeted behavioral verification
-Status: TODO
+Status: DONE
 Dependency: AI-SELF-003
 Owners: Test Automation, QA
 Task description:
@@ -88,20 +89,41 @@ Completion criteria:
 - [ ] Assertions verify actual answer-state payload content, not only success status codes.
 - [ ] Execution log records exact commands and outcomes.

+### AI-SELF-006 - Ingestion-backed live corpus verification
+Status: DONE
+Dependency: AI-SELF-001
+Owners: Developer (AdvisoryAI), Test Automation
+Task description:
+- Rebuild AdvisoryAI and unified-search indexes from the local corpus and verify that contextual answers are returned over real ingested data, not only test doubles.
+- Document the exact rebuild order, required local setup, and the query paths currently covered by live verification.
+
+Completion criteria:
+- [ ] Local rebuild order is explicit and exercised: `/v1/advisory-ai/index/rebuild` then `/v1/search/index/rebuild`.
+- [ ] At least one ingestion-backed query path returns a contextual answer payload from the running local service.
+- [ ] Docs and sprint log state which live routes are verified today and which routes still rely on mocks.
+
 ## Execution Log
 | Date (UTC) | Update | Owner |
 | --- | --- | --- |
 | 2026-03-07 | Sprint created to formalize backend-grounded contextual answers after the FE answer-first shell. | Project Manager |
+| 2026-03-07 | Added explicit ingestion-backed verification scope so backend answer orchestration must be validated against a rebuilt local corpus instead of only stubbed endpoint tests. | Project Manager |
+| 2026-03-07 | Implemented `contextAnswer` in unified search/backend API mapping, added deterministic `grounded` / `clarify` / `insufficient` rules plus follow-up question generation, and extended telemetry fields for answer-state visibility. | Developer |
+| 2026-03-07 | Verified the AdvisoryAI test project after the contract change with `dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" --no-restore -v normal` (`877/877` passing). | Test Automation |
+| 2026-03-07 | Exercised the live rebuilt-corpus lane against `http://127.0.0.1:10451`: `POST /v1/advisory-ai/index/rebuild`, `POST /v1/search/index/rebuild`, then `POST /v1/search/query` for `database connectivity`, which returned `contextAnswer.status = grounded`, 3 citations, and 10 cards over ingested data. | Test Automation |

 ## Decisions & Risks
 - Decision: the backend contract must return explicit answer states instead of leaving the UI to infer confidence from cards alone.
 - Decision: the product requirement is 100% response framing, not 100% hallucinated AI answers.
+- Decision: answer orchestration is not considered verified until it passes both targeted `.csproj` integration tests and at least one live query over a rebuilt local corpus.
 - Risk: answer payload inflation could make the endpoint harder to evolve if fields are not clearly optional.
 - Mitigation: use additive optional fields with strict docs and integration coverage.
 - Risk: telemetry may leak raw user intent if not hashed or summarized carefully.
 - Mitigation: follow existing hashed-query analytics patterns and avoid raw prompt persistence where not required for history UX.
+- Risk: mocked endpoint tests can overstate confidence if ingestion adapters or corpus rebuild order drift.
+- Mitigation: keep rebuild order documented, execute it during verification, and record which routes have live-ingested parity.
+- Decision: `stella advisoryai sources prepare` is optional for local verification when checked-in Doctor seed/control files are already sufficient, but it requires `STELLAOPS_BACKEND_URL` whenever live Doctor discovery is expected.

 ## Next Checkpoints
 - 2026-03-10: Freeze answer payload shape and fallback taxonomy.
 - 2026-03-12: Complete endpoint integration tests for answer states.
- 2026-03-13: Hand off payload contract to FE rollout sprint.
+- 2026-03-13: Hand off payload contract and live-ingested verification notes to FE rollout sprint.
--- a/docs/implplan/SPRINT_20260307_006_FE_self_serve_rollout_and_gap_closure.md
+++ b/docs/implplan/SPRINT_20260307_006_FE_self_serve_rollout_and_gap_closure.md
@@ -4,8 +4,9 @@
 - Roll the self-serve search contract across priority pages so the experience is consistent wherever operators land.
 - Close the gap between search, AdvisoryAI, and page workflows by wiring guided actions and telemetry-informed improvements.
 - Convert unanswered journeys into visible UX backlog items and stronger end-to-end coverage.
+- Pair deterministic mock coverage with explicit live-ingested verification for routes that already have local corpus parity.
 - Working directory: `src/Web/StellaOps.Web`.
- Expected evidence: page-contract rollouts, Playwright operator-journey suites, updated docs/task boards, and execution-log entries.
+- Expected evidence: page-contract rollouts, Playwright operator-journey suites, live-ingested verification where supported, updated docs/task boards, and execution-log entries.

 ## Dependencies & Concurrency
 - Depends on:
@@ -76,7 +77,20 @@ Task description:
 Completion criteria:
 - [ ] Playwright covers the priority page journeys end to end.
 - [ ] Tests verify grounded, clarify, and recovery paths.
- [ ] Suites remain deterministic with route mocks and local fixtures only.
+- [ ] Suites remain deterministic with route mocks/local fixtures for routes that do not yet have live corpus parity.
+
+### FE-ROLL-006 - Live ingested-corpus search verification
+Status: DONE
+Dependency: FE-ROLL-004
+Owners: Test Automation, QA, Developer (FE)
+Task description:
+- Run Playwright against at least one route backed by a real locally rebuilt AdvisoryAI corpus and verify that suggestions, answer framing, and follow-up handoffs work without route stubs for search data.
+- Track route parity explicitly so teams know which pages are mock-only and which already have live corpus validation.
+
+Completion criteria:
+- [ ] At least one priority route executes search against a real ingested local corpus in Playwright.
+- [ ] Docs call out live-verified routes versus mock-backed routes.
+- [ ] Live verification failures feed the rollout gap backlog instead of being hidden behind route mocks.

 ### FE-ROLL-005 - Docs and rollout readiness
 Status: TODO
@@ -94,14 +108,19 @@ Completion criteria:
 | Date (UTC) | Update | Owner |
 | --- | --- | --- |
 | 2026-03-07 | Sprint created to roll out the self-serve search contract after the answer-first shell and backend answer contract are defined. | Project Manager |
+| 2026-03-07 | Added explicit live-ingested verification scope so rollout evidence distinguishes mock-backed journeys from real corpus coverage. | Project Manager |
+| 2026-03-07 | Re-ran live Playwright verification for the Doctor route against a rebuilt local AdvisoryAI corpus (`unified-search-contextual-suggestions.live.e2e.spec.ts`) and confirmed automatic chips, grounded answer framing, and follow-up chips over real search data. | Test Automation |

 ## Decisions & Risks
 - Decision: rollout should prioritize high-frequency operator pages before broad route coverage.
 - Decision: page teams own their self-serve question sets; platform code owns composition and safety rails.
+- Decision: mock-backed Playwright remains acceptable for pages without live corpus parity, but rollout is incomplete until at least one priority route is verified against a rebuilt local corpus.
 - Risk: inconsistent page adoption would make self-serve feel random across the product.
 - Mitigation: maintain a priority rollout list and explicit page-ownership rules.
 - Risk: adding guided actions without verification can create shallow or broken handoffs.
 - Mitigation: require Playwright coverage for each high-value journey.
+- Risk: live-ingested routes can drift from mocked expectations as ingestion adapters evolve.
+- Mitigation: document route parity and keep at least one live route in the regular regression pack.

 ## Next Checkpoints
 - 2026-03-12: Priority page rollout started after answer contract freeze.
--- a/docs/modules/advisory-ai/knowledge-search.md
+++ b/docs/modules/advisory-ai/knowledge-search.md
@@ -124,6 +124,8 @@ Implemented in `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSea
  - Contract remains backward-compatible: if an API deployment does not yet consume `lastAction`, unknown ambient fields are ignored and base search behavior remains unchanged.
  - UI suggestion behavior now combines obvious route defaults with one strategic non-obvious suggestion and action-aware variants (for example, policy/VEX impact and incident timeline pivots).
  - Search and AdvisoryAI also share a persisted operator mode (`Find`, `Explain`, `Act`); the UI uses the same mode to rank chips, compose Ask-AI prompts, and label assistant return flows, while backend query contracts remain backward-compatible.
+  - Unified search now also returns optional `contextAnswer` metadata with `status`, `code`, `summary`, `reason`, `evidence`, bounded `citations`, and bounded follow-up `questions`.
+  - `contextAnswer.status` is deterministic and must be one of `grounded`, `clarify`, or `insufficient`.
 - Unified index lifecycle:
  - Manual rebuild endpoint: `POST /v1/search/index/rebuild`.
  - Optional background refresh loop is available via `KnowledgeSearchOptions` (`UnifiedAutoIndexEnabled`, `UnifiedAutoIndexOnStartup`, `UnifiedIndexRefreshIntervalSeconds`).
@@ -172,7 +174,9 @@ Global search now consumes AKS and supports:
  - Doctor: `Run` (navigate to doctor and copy run command).
 - `More` action for "show more like this" local query expansion.
 - A shared mode switch (`Find`, `Explain`, `Act`) across search and AdvisoryAI with mode-aware chip ranking and handoff prompts.
- An answer-first FE shell: every non-empty search renders a visible answer state (`grounded`, `clarify`, `insufficient`) before raw cards, using existing synthesis/cards plus page context until a backend `contextAnswer` payload is introduced.
+- An answer-first search experience: every non-empty search renders a visible answer state (`grounded`, `clarify`, `insufficient`) before raw cards.
+  - Preferred source is backend `contextAnswer`.
+  - FE shell composition remains only as a backward-compatible fallback for older API deployments that do not emit `contextAnswer`.
 - Page-owned self-serve questions and clarifiers, defined in `docs/modules/ui/search-self-serve-contract.md`, so search can offer "Common questions" and recovery prompts without per-page conditionals in the component.
 - Zero-result rescue actions that keep the current query visible while broadening scope, trying a related pivot, retrying with page context, or opening AdvisoryAI reformulation.
 - AdvisoryAI evidence-first next-step cards that can return search pivots (`chat_next_step_search`, `chat_next_step_policy`) back into global search or open cited evidence/context directly.
@@ -191,6 +195,10 @@ AKS commands:
  - `POST /v1/search/synthesize`
  - `POST /v1/search/index/rebuild`

+Notes:
+- Do not assume `stella` is already installed on `PATH` in a source checkout. Build or run it from source as described in `docs/API_CLI_REFERENCE.md` and `docs/modules/cli/guides/quickstart.md`.
+- `stella advisoryai sources prepare` needs `STELLAOPS_BACKEND_URL` (or equivalent CLI config) when live Doctor check discovery is expected. Without that URL, use the checked-in Doctor seed/control files and the HTTP rebuild endpoints for local verification.
+
 Output:
 - Human mode: grouped actionable references.
 - JSON mode: stable machine-readable payload.
@@ -318,6 +326,7 @@ export ADVISORYAI__AdvisoryAI__KnowledgeSearch__RepositoryRoot="$(pwd)"
 dotnet run --project "src/AdvisoryAI/StellaOps.AdvisoryAI.WebService/StellaOps.AdvisoryAI.WebService.csproj" --no-launch-profile

 # In a second shell, rebuild the live corpus in the required order
+export STELLAOPS_BACKEND_URL="http://127.0.0.1:10451"
 dotnet run --project "src/Cli/StellaOps.Cli/StellaOps.Cli.csproj" -- advisoryai sources prepare --json
 dotnet run --project "src/Cli/StellaOps.Cli/StellaOps.Cli.csproj" -- advisoryai index rebuild --json
 curl -X POST http://127.0.0.1:10451/v1/search/index/rebuild \
@@ -342,6 +351,7 @@ Local examples:

 ```bash
 # Run directly from source without installing to PATH
+export STELLAOPS_BACKEND_URL="http://127.0.0.1:10451"
 dotnet run --project "src/Cli/StellaOps.Cli/StellaOps.Cli.csproj" -- advisoryai sources prepare --json

 # Publish a reusable local binary
@@ -358,6 +368,12 @@ If the CLI is not built yet, the equivalent HTTP endpoints are:
 - `POST /v1/advisory-ai/index/rebuild` for the docs/OpenAPI/Doctor corpus
 - `POST /v1/search/index/rebuild` for unified overlay domains

+Current live verification coverage:
+- Rebuild order exercised against a running local service: `POST /v1/advisory-ai/index/rebuild` then `POST /v1/search/index/rebuild`
+- Verified live query: `database connectivity`
+- Verified live outcome: response includes `contextAnswer.status = grounded`, citations, and entity cards over ingested data
+- Other routes still rely on deterministic mock-backed Playwright coverage until their ingestion parity is explicitly verified
+
 Or use the full CI testing stack:
 ```bash
 docker compose -f devops/compose/docker-compose.testing.yml --profile ci up -d
--- a/docs/modules/advisory-ai/unified-search-architecture.md
+++ b/docs/modules/advisory-ai/unified-search-architecture.md
@@ -56,6 +56,12 @@ flowchart LR
  - grounding score
  - action suggestions
 - If LLM is unavailable or blocked by quota, deterministic output is still returned.
+- Query responses may also include a deterministic `contextAnswer` envelope for answer-first search UX:
+  - `status`: `grounded` | `clarify` | `insufficient`
+  - `code`, `summary`, `reason`, `evidence`
+  - bounded `citations`
+  - bounded follow-up `questions`
+- The answer envelope is additive and optional so older clients remain compatible.

 ## Data Flow

@@ -95,6 +101,13 @@ sequenceDiagram
 - `POST /v1/search/synthesize`
 - `POST /v1/search/index/rebuild`

+`POST /v1/search/query` response notes:
+- Entity cards remain the primary retrieval payload.
+- `contextAnswer` is the preferred answer-first surface for Web self-serve UX when present.
+- Live local verification currently covers the Doctor/knowledge path after the documented rebuild order:
+  1. `POST /v1/advisory-ai/index/rebuild`
+  2. `POST /v1/search/index/rebuild`
+
 OpenAPI contract presence is validated by integration test:
 - `UnifiedSearchEndpointsIntegrationTests.OpenApi_Includes_UnifiedSearch_Contracts`