Restore Doctor search after AdvisoryAI cold-start race
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
# Sprint 20260311_011 - AdvisoryAI Knowledge Startup Lock And Doctor Search Restore
|
||||
|
||||
## Topic & Scope
|
||||
- Restore Doctor unified search on the scratch-built `stella-ops.local` stack after fresh-stack Playwright exposed an empty knowledge corpus on `/ops/operations/doctor`.
|
||||
- Fix the AdvisoryAI startup race so knowledge corpus rebuild and unified-search refresh can touch the same store during cold start without breaking first-run correctness.
|
||||
- Keep the live mission-control sweep evidence truthful by removing the remaining `View all` selector false negative uncovered in the same pass.
|
||||
- Working directory: `src/AdvisoryAI`.
|
||||
- Expected evidence: focused AdvisoryAI integration coverage, rebuilt `advisory-ai-web` startup proof, and live Playwright artifacts for Doctor unified search plus mission-control actions.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Depends on `docs/implplan/SPRINT_20260311_010_Platform_scratch_setup_revalidation.md`.
|
||||
- Allowed cross-module evidence touch: `src/Web/StellaOps.Web/scripts/live-mission-control-action-sweep.mjs`.
|
||||
|
||||
## Documentation Prerequisites
|
||||
- `AGENTS.md`
|
||||
- `docs/modules/advisory-ai/knowledge-search.md`
|
||||
- `docs/qa/feature-checks/FLOW.md`
|
||||
|
||||
## Delivery Tracker
|
||||
|
||||
### TASK-01 - Make knowledge schema bootstrap concurrency-safe
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: QA, 3rd line support, Architect, Developer
|
||||
Task description:
|
||||
- Reproduce the Doctor search failure from the live scratch stack and trace it into the AdvisoryAI knowledge startup path.
|
||||
- Fix `PostgresKnowledgeSearchStore.EnsureSchemaAsync()` so concurrent hosted services cannot race on schema creation and leave the Doctor/knowledge corpus empty on first boot.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Concurrent cold-start schema bootstrap no longer fails in the knowledge store.
|
||||
- [x] Focused regression coverage exercises concurrent `EnsureSchemaAsync()` calls against PostgreSQL.
|
||||
|
||||
### TASK-02 - Rebuild and prove Doctor unified search on the live scratch stack
|
||||
Status: DONE
|
||||
Dependency: TASK-01
|
||||
Owners: QA, Developer
|
||||
Task description:
|
||||
- Rebuild and redeploy AdvisoryAI, then rerun the live Doctor unified-search matrix and direct starter-query probes.
|
||||
- Recheck the mission-control action sweep after tightening the `View all` selector so the QA artifact reflects actual product behavior.
|
||||
|
||||
Completion criteria:
|
||||
- [x] `advisory-ai-web` startup logs show a successful knowledge rebuild on the live stack.
|
||||
- [x] Live Playwright Doctor unified-search evidence is clean on the scratch deployment.
|
||||
- [x] Mission-control action sweep passes without the stale `View all` false negative.
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-03-11 | Sprint created after the fresh-stack unified-search matrix isolated Doctor failures to an empty knowledge scope and container logs showed the knowledge startup rebuild failing with PostgreSQL `23505` during schema bootstrap. | QA / 3rd line support |
|
||||
| 2026-03-11 | Root cause traced to concurrent `EnsureSchemaAsync()` callers from AdvisoryAI hosted services. Applied a PostgreSQL advisory transaction lock to the knowledge store and added a focused concurrent startup regression. | Architect / Developer |
|
||||
| 2026-03-11 | Tightened the mission-board Playwright harness so `View all` binds to the real `/releases/runs` anchor instead of a generic text match. | QA / Developer |
|
||||
| 2026-03-11 | Rebuilt and redeployed `advisory-ai-web`; live startup logs now show a successful knowledge rebuild (`documents=470`, `chunks=9051`, `doctor_projections=8`). Reran the live unified-search matrix cleanly (`4 routes checked, 0 issues`), directly rechecked Doctor starter queries with grounded results, and confirmed the mission-control action sweep passes with zero failed actions/runtime issues. | QA / Developer |
|
||||
|
||||
## Decisions & Risks
|
||||
- Decision: keep Doctor mapped to the knowledge scope. The live failure was caused by the knowledge corpus not rebuilding on startup, not by the Doctor route using the wrong search domain.
|
||||
- Decision: fix concurrency inside the knowledge store rather than by trying to sequence hosted services manually. Multiple startup callers are valid and the store must stay safe under them.
|
||||
- Decision: use a PostgreSQL advisory transaction lock inside the store bootstrap path so the first-run contract remains correct regardless of how many hosted services touch the knowledge store during startup.
|
||||
|
||||
## Next Checkpoints
|
||||
- Archive on local commit; Doctor search is restored on the live scratch stack.
|
||||
@@ -389,6 +389,7 @@ Notes:
|
||||
- `stella advisoryai index rebuild` and `stella search index rebuild` invoke authenticated backend endpoints. For a local source-checkout verification lane without a signed-in CLI session, use `sources prepare` via CLI and the direct HTTP rebuild calls above with explicit `X-StellaOps-*` headers.
|
||||
- Compose/runtime requirement: the published AdvisoryAI service image must carry a repo-shaped local corpus under its app content root so `POST /v1/advisory-ai/index/rebuild` can resolve `docs/**`, `devops/compose/openapi_current.json`, and `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/*.json` even when the source checkout is not mounted into the container. If those assets are absent, live search on `stella-ops.local` degrades to partial unified rows only and documentation/Doctor/API answers disappear.
|
||||
- Fresh service startup now auto-runs the knowledge rebuild by default (`AdvisoryAI__KnowledgeSearch__KnowledgeAutoIndexOnStartup=true`). This is the scratch-setup convergence path for `stella-ops.local`: a wiped deployment must populate the documentation/API/Doctor corpus without requiring operators to call `POST /v1/advisory-ai/index/rebuild` manually. Keep the manual endpoint for explicit refreshes and local live-search lanes, but do not depend on it for first-run correctness.
|
||||
- Startup schema bootstrap is protected by a PostgreSQL advisory transaction lock. AdvisoryAI cold start can trigger both the knowledge rebuild host and unified-search refresh paths against the same store, so `EnsureSchemaAsync()` must serialize `CREATE SCHEMA` and migration application instead of relying on `IF NOT EXISTS` alone.
|
||||
- The published app content root must also carry the full unified snapshot corpus under `src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/*.json`; packaging only findings/VEX/policy snapshots leaves graph, OpsMemory, timeline, and scanner answer lanes permanently corpus-unready in the live shell.
|
||||
|
||||
### CLI setup in a source checkout
|
||||
|
||||
Reference in New Issue
Block a user