Auto-rebuild AdvisoryAI knowledge corpus on startup

2026-03-10 20:18:12 +02:00
parent d93006a8fa
commit f727ec24fd
7 changed files with 435 additions and 0 deletions
--- a/docs/implplan/SPRINT_20260310_033_FE_live_frontdoor_unified_search_route_matrix.md
+++ b/docs/implplan/SPRINT_20260310_033_FE_live_frontdoor_unified_search_route_matrix.md
@@ -0,0 +1,54 @@
+# Sprint 20260310_033 - FE Live Frontdoor Unified Search Route Matrix
+
+## Topic & Scope
+- Reverify unified search directly on `https://stella-ops.local` after the full scratch rebuild and backend refresh, not only on the standalone search harness.
+- Exercise supported route-local search starters end to end through the real authenticated shell and capture runtime evidence for route context, query execution, and result grounding.
+- Repair any search-runtime convergence defect that prevents a wiped local install from surfacing viable Doctor, Policy, Findings, or VEX starters without manual post-start rebuild steps.
+- Working directory: `src/AdvisoryAI/`.
+- Allowed coordination edits: `src/Web/StellaOps.Web/scripts`, `docs/modules/advisory-ai/**`, and this sprint file.
+- Expected evidence: a live Playwright frontdoor sweep script, JSON output under `src/Web/StellaOps.Web/output/playwright/`, focused AdvisoryAI tests, targeted image rebuild/redeploy, and a scoped local commit.
+
+## Dependencies & Concurrency
+- Depends on the scratch rebuild baseline and the current healthy compose stack on `https://stella-ops.local`.
+- Safe parallelism: stay within live search harnesses, unified-search UI, and minimal docs updates required by any discovered defect.
+
+## Documentation Prerequisites
+- `AGENTS.md`
+- `docs/qa/feature-checks/FLOW.md`
+- `docs/modules/ui/search-zero-learning-primary-entry.md`
+- `docs/modules/advisory-ai/knowledge-search.md`
+
+## Delivery Tracker
+
+### FE-LIVE-SEARCH-001 - Add and execute a frontdoor unified-search route matrix
+Status: DONE
+Dependency: none
+Owners: QA, 3rd line support, Product Manager, Architect, Developer
+Task description:
+- The repo already has live search verification for the standalone local shell plus AdvisoryAI runtime, but this scratch iteration needs the same route-by-route proof against the real authenticated Stella Ops frontdoor.
+- Add a script that authenticates against `https://stella-ops.local`, opens the supported route-local search surfaces, captures surfaced starter chips, executes each chip, and fails on missing context, missing starters, degraded banners, dead-end query execution, or runtime/network errors.
+- Live proof now shows a deeper backend/setup failure: Doctor context renders, but `POST /api/v1/search/suggestions/evaluate` returns `current_scope_corpus_unready` for the knowledge scope after a full scratch rebuild. The fix must make AdvisoryAI converge the knowledge corpus on startup instead of relying on manual rebuild commands.
+
+Completion criteria:
+- [x] A live frontdoor search matrix script exists under `src/Web/StellaOps.Web/scripts/`.
+- [x] The script writes structured JSON evidence under `src/Web/StellaOps.Web/output/playwright/`.
+- [x] The script verifies route context plus starter-chip execution on Doctor, Security Triage, Policy, and Advisories & VEX.
+- [x] Any product defects exposed by the run are root-caused, fixed, rebuilt, reverified, and committed in this iteration.
+
+## Execution Log
+| Date (UTC) | Update | Owner |
+| --- | --- | --- |
+| 2026-03-10 | Sprint created after the scratch rebuild, canonical route sweep, and release-promotion repair commit. Notifications recheck is clean again on the rebuilt stack, so the next untouched high-risk live surface is unified search through the real frontdoor shell. | QA |
+| 2026-03-10 | Added `scripts/live-frontdoor-unified-search-route-matrix.mjs` and ran it against the rebuilt stack. Doctor search reproduces a real setup/runtime defect: the frontdoor returns `current_scope_corpus_unready` for all knowledge-scope starter queries even though the shell context is correct. Root-cause work is now moving into AdvisoryAI startup convergence. | QA / 3rd line support |
+| 2026-03-10 | Implemented AdvisoryAI startup convergence so the knowledge corpus rebuilds automatically on fresh service startup, rebuilt and redeployed `advisory-ai-web`, and confirmed the live container reports `documents=470`, `chunks=9051`, `api_operations=2190`, `doctor_projections=8` during startup rebuild. | Developer / 3rd line support |
+| 2026-03-10 | Reverified the live authenticated shell with a Playwright all-chip probe and wrote `src/Web/StellaOps.Web/output/playwright/live-frontdoor-unified-search-route-matrix-manual.json`. Doctor, Security Triage, Policy, and Advisories & VEX all render context-aware starter chips and their visible chip actions now resolve to grounded answers with cards. | QA |
+
+## Decisions & Risks
+- Decision: frontdoor search verification must not rely on the standalone Angular/AdvisoryAI harness alone; the authenticated shell is the product surface the client sees.
+- Decision: scratch deployment success requires AdvisoryAI to populate its own knowledge corpus on startup. A healthy container with an empty knowledge scope is not an acceptable “ready” state.
+- Decision: only the AdvisoryAI web host owns startup knowledge-index convergence. The shared library must not register that hosted service globally because the worker shares the same core registrations and would otherwise perform a duplicate rebuild on startup.
+- Risk: live search starters depend on current route context and runtime corpus readiness, so the sweep must distinguish product regressions from transient auth/runtime setup failures with structured evidence.
+
+## Next Checkpoints
+- Implement the live frontdoor search sweep harness.
+- Run it against the rebuilt stack and triage any failures before widening to the next untouched page family.
--- a/docs/modules/advisory-ai/knowledge-search.md
+++ b/docs/modules/advisory-ai/knowledge-search.md
@@ -388,6 +388,7 @@ Notes:
 - Set `AdvisoryAI__KnowledgeSearch__RepositoryRoot` only when you are running the service from a non-standard layout or a packaged binary tree that is not inside the repository.
 - `stella advisoryai index rebuild` and `stella search index rebuild` invoke authenticated backend endpoints. For a local source-checkout verification lane without a signed-in CLI session, use `sources prepare` via CLI and the direct HTTP rebuild calls above with explicit `X-StellaOps-*` headers.
 - Compose/runtime requirement: the published AdvisoryAI service image must carry a repo-shaped local corpus under its app content root so `POST /v1/advisory-ai/index/rebuild` can resolve `docs/**`, `devops/compose/openapi_current.json`, and `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/*.json` even when the source checkout is not mounted into the container. If those assets are absent, live search on `stella-ops.local` degrades to partial unified rows only and documentation/Doctor/API answers disappear.
+- Fresh service startup now auto-runs the knowledge rebuild by default (`AdvisoryAI__KnowledgeSearch__KnowledgeAutoIndexOnStartup=true`). This is the scratch-setup convergence path for `stella-ops.local`: a wiped deployment must populate the documentation/API/Doctor corpus without requiring operators to call `POST /v1/advisory-ai/index/rebuild` manually. Keep the manual endpoint for explicit refreshes and local live-search lanes, but do not depend on it for first-run correctness.
 - The published app content root must also carry the full unified snapshot corpus under `src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/*.json`; packaging only findings/VEX/policy snapshots leaves graph, OpsMemory, timeline, and scanner answer lanes permanently corpus-unready in the live shell.

 ### CLI setup in a source checkout
--- a/src/AdvisoryAI/StellaOps.AdvisoryAI.WebService/Program.cs
+++ b/src/AdvisoryAI/StellaOps.AdvisoryAI.WebService/Program.cs
@@ -15,6 +15,7 @@ using StellaOps.AdvisoryAI.Evidence;
 using StellaOps.AdvisoryAI.Explanation;
 using StellaOps.AdvisoryAI.Hosting;
 using StellaOps.AdvisoryAI.Inference.LlmProviders;
+using StellaOps.AdvisoryAI.KnowledgeSearch;
 using StellaOps.AdvisoryAI.Metrics;
 using StellaOps.AdvisoryAI.Orchestration;
 using StellaOps.AdvisoryAI.Outputs;
@@ -53,6 +54,7 @@ builder.Configuration

 builder.Services.AddAdvisoryAiCore(builder.Configuration);
 builder.Services.AddUnifiedSearch(builder.Configuration);
+builder.Services.TryAddEnumerable(ServiceDescriptor.Singleton<IHostedService, KnowledgeSearchStartupRebuildService>());

 var llmAdapterEnabled = builder.Configuration.GetValue<bool?>("AdvisoryAI:Adapters:Llm:Enabled") ?? false;
 if (llmAdapterEnabled)
--- a/src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchOptions.cs
+++ b/src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchOptions.cs
@@ -54,6 +54,8 @@ public sealed class KnowledgeSearchOptions

    public List<string> OpenApiRoots { get; set; } = ["src", "devops/compose"];

+    public bool KnowledgeAutoIndexOnStartup { get; set; } = true;
+
    public string UnifiedFindingsSnapshotPath { get; set; } =
        "src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/findings.snapshot.json";

--- a/src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchStartupRebuildService.cs
+++ b/src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchStartupRebuildService.cs
@@ -0,0 +1,57 @@
+using Microsoft.Extensions.Hosting;
+using Microsoft.Extensions.Logging;
+using Microsoft.Extensions.Options;
+
+namespace StellaOps.AdvisoryAI.KnowledgeSearch;
+
+internal sealed class KnowledgeSearchStartupRebuildService : IHostedService
+{
+    private readonly KnowledgeSearchOptions _options;
+    private readonly IKnowledgeIndexer _indexer;
+    private readonly ILogger<KnowledgeSearchStartupRebuildService> _logger;
+
+    public KnowledgeSearchStartupRebuildService(
+        IOptions<KnowledgeSearchOptions> options,
+        IKnowledgeIndexer indexer,
+        ILogger<KnowledgeSearchStartupRebuildService> logger)
+    {
+        ArgumentNullException.ThrowIfNull(options);
+        _options = options.Value ?? new KnowledgeSearchOptions();
+        _indexer = indexer ?? throw new ArgumentNullException(nameof(indexer));
+        _logger = logger ?? throw new ArgumentNullException(nameof(logger));
+    }
+
+    public async Task StartAsync(CancellationToken cancellationToken)
+    {
+        if (!_options.Enabled)
+        {
+            _logger.LogDebug("AdvisoryAI knowledge search is disabled; skipping startup rebuild.");
+            return;
+        }
+
+        if (!_options.KnowledgeAutoIndexOnStartup)
+        {
+            _logger.LogDebug("AdvisoryAI knowledge startup rebuild is disabled.");
+            return;
+        }
+
+        try
+        {
+            var summary = await _indexer.RebuildAsync(cancellationToken).ConfigureAwait(false);
+            _logger.LogInformation(
+                "AdvisoryAI knowledge startup rebuild completed: documents={DocumentCount}, chunks={ChunkCount}, api_specs={ApiSpecCount}, api_operations={ApiOperationCount}, doctor_projections={DoctorProjectionCount}, duration_ms={DurationMs}",
+                summary.DocumentCount,
+                summary.ChunkCount,
+                summary.ApiSpecCount,
+                summary.ApiOperationCount,
+                summary.DoctorProjectionCount,
+                summary.DurationMs);
+        }
+        catch (Exception ex) when (ex is not OperationCanceledException)
+        {
+            _logger.LogWarning(ex, "AdvisoryAI knowledge startup rebuild failed.");
+        }
+    }
+
+    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
+}
--- a/src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchStartupRebuildServiceTests.cs
+++ b/src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchStartupRebuildServiceTests.cs
@@ -0,0 +1,81 @@
+using Microsoft.Extensions.Logging.Abstractions;
+using Microsoft.Extensions.Options;
+using StellaOps.AdvisoryAI.KnowledgeSearch;
+using Xunit;
+
+namespace StellaOps.AdvisoryAI.Tests.KnowledgeSearch;
+
+[Trait("Category", "Unit")]
+public sealed class KnowledgeSearchStartupRebuildServiceTests
+{
+    [Fact]
+    public async Task StartAsync_rebuilds_knowledge_index_when_enabled()
+    {
+        var indexer = new RecordingKnowledgeIndexer();
+        var service = new KnowledgeSearchStartupRebuildService(
+            Options.Create(new KnowledgeSearchOptions
+            {
+                Enabled = true,
+                KnowledgeAutoIndexOnStartup = true,
+            }),
+            indexer,
+            NullLogger<KnowledgeSearchStartupRebuildService>.Instance);
+
+        await service.StartAsync(CancellationToken.None);
+
+        Assert.Equal(1, indexer.RebuildCallCount);
+    }
+
+    [Fact]
+    public async Task StartAsync_skips_rebuild_when_startup_bootstrap_is_disabled()
+    {
+        var indexer = new RecordingKnowledgeIndexer();
+        var service = new KnowledgeSearchStartupRebuildService(
+            Options.Create(new KnowledgeSearchOptions
+            {
+                Enabled = true,
+                KnowledgeAutoIndexOnStartup = false,
+            }),
+            indexer,
+            NullLogger<KnowledgeSearchStartupRebuildService>.Instance);
+
+        await service.StartAsync(CancellationToken.None);
+
+        Assert.Equal(0, indexer.RebuildCallCount);
+    }
+
+    [Fact]
+    public async Task StartAsync_skips_rebuild_when_knowledge_search_is_disabled()
+    {
+        var indexer = new RecordingKnowledgeIndexer();
+        var service = new KnowledgeSearchStartupRebuildService(
+            Options.Create(new KnowledgeSearchOptions
+            {
+                Enabled = false,
+                KnowledgeAutoIndexOnStartup = true,
+            }),
+            indexer,
+            NullLogger<KnowledgeSearchStartupRebuildService>.Instance);
+
+        await service.StartAsync(CancellationToken.None);
+
+        Assert.Equal(0, indexer.RebuildCallCount);
+    }
+
+    private sealed class RecordingKnowledgeIndexer : IKnowledgeIndexer
+    {
+        public int RebuildCallCount { get; private set; }
+
+        public Task<KnowledgeRebuildSummary> RebuildAsync(CancellationToken cancellationToken)
+        {
+            RebuildCallCount++;
+            return Task.FromResult(new KnowledgeRebuildSummary(
+                DocumentCount: 470,
+                ChunkCount: 9050,
+                ApiSpecCount: 1,
+                ApiOperationCount: 2190,
+                DoctorProjectionCount: 8,
+                DurationMs: 42));
+        }
+    }
+}
--- a/src/Web/StellaOps.Web/scripts/live-frontdoor-unified-search-route-matrix.mjs
+++ b/src/Web/StellaOps.Web/scripts/live-frontdoor-unified-search-route-matrix.mjs
@@ -0,0 +1,238 @@
+#!/usr/bin/env node
+
+import { mkdirSync, writeFileSync } from 'node:fs';
+import path from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+import { chromium } from 'playwright';
+
+import { authenticateFrontdoor, createAuthenticatedContext } from './live-frontdoor-auth.mjs';
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = path.dirname(__filename);
+const webRoot = path.resolve(__dirname, '..');
+const outputDirectory = path.join(webRoot, 'output', 'playwright');
+const statePath = path.join(outputDirectory, 'live-frontdoor-auth-state.json');
+const reportPath = path.join(outputDirectory, 'live-frontdoor-auth-report.json');
+const resultPath = path.join(outputDirectory, 'live-frontdoor-unified-search-route-matrix.json');
+const scopeQuery = 'tenant=demo-prod&regions=us-east&environments=stage&timeWindow=7d';
+
+const routeMatrix = [
+  {
+    key: 'doctor',
+    label: 'Doctor',
+    route: '/ops/operations/doctor',
+    expectedContext: /doctor diagnostics/i,
+  },
+  {
+    key: 'triage',
+    label: 'Security Triage',
+    route: '/security/triage',
+    expectedContext: /(security\s*\/\s*triage|findings triage)/i,
+  },
+  {
+    key: 'policy',
+    label: 'Policy',
+    route: '/ops/policy',
+    expectedContext: /(policy|policy workspace)/i,
+  },
+  {
+    key: 'vex',
+    label: 'Advisories & VEX',
+    route: '/security/advisories-vex',
+    expectedContext: /(advisories\s*&\s*vex|vex intelligence)/i,
+  },
+];
+
+function createRuntime() {
+  return {
+    consoleErrors: [],
+    pageErrors: [],
+  };
+}
+
+function attachRuntimeListeners(page, runtime) {
+  page.on('console', (message) => {
+    if (message.type() === 'error') {
+      runtime.consoleErrors.push({
+        timestamp: Date.now(),
+        page: page.url(),
+        text: message.text(),
+      });
+    }
+  });
+
+  page.on('pageerror', (error) => {
+    runtime.pageErrors.push({
+      timestamp: Date.now(),
+      page: page.url(),
+      message: error.message,
+    });
+  });
+}
+
+async function readVisibleTexts(locator) {
+  return locator.evaluateAll((nodes) =>
+    nodes
+      .map((node) => (node.textContent || '').trim().replace(/\s+/g, ' '))
+      .filter(Boolean),
+  ).catch(() => []);
+}
+
+async function openSearch(page, route) {
+  await page.goto(`https://stella-ops.local${route}?${scopeQuery}`, {
+    waitUntil: 'domcontentloaded',
+    timeout: 30_000,
+  });
+  await page.waitForTimeout(2_500);
+  const input = page.locator('app-global-search input[type="text"]').first();
+  await input.click({ timeout: 10_000 });
+  await page.waitForSelector('.search__results', { state: 'visible', timeout: 10_000 });
+  await page.waitForTimeout(4_000);
+}
+
+async function captureSnapshot(page, routeConfig, runtime, routeStartedAt) {
+  return {
+    route: routeConfig.route,
+    url: page.url(),
+    contextTitle: (await page.locator('.search__context-title').first().textContent().catch(() => '')).trim(),
+    starterChips: await readVisibleTexts(page.locator('.search__suggestions .search__chip')),
+    degradedBanners: await readVisibleTexts(page.locator('.search__degraded-banner')),
+    emptyStates: await readVisibleTexts(page.locator('.search__empty, .search__empty-state-copy')),
+    answerStatuses: await page.locator('[data-answer-status]').evaluateAll((nodes) =>
+      nodes
+        .map((node) => node.getAttribute('data-answer-status') || '')
+        .filter(Boolean),
+    ).catch(() => []),
+    cardTitles: await readVisibleTexts(page.locator('.search__cards .entity-card__title')),
+    consoleErrors: runtime.consoleErrors.filter((entry) => entry.timestamp >= routeStartedAt),
+    pageErrors: runtime.pageErrors.filter((entry) => entry.timestamp >= routeStartedAt),
+  };
+}
+
+async function executeStarter(page, routeConfig, starterIndex, runtime) {
+  const routeStartedAt = Date.now();
+  await openSearch(page, routeConfig.route);
+
+  const chips = page.locator('.search__suggestions .search__chip');
+  const count = await chips.count().catch(() => 0);
+  if (count <= starterIndex) {
+    throw new Error(`Starter chip index ${starterIndex} is not available on ${routeConfig.route}`);
+  }
+
+  const chip = chips.nth(starterIndex);
+  const starterText = ((await chip.textContent().catch(() => '')) || '').trim();
+  if (!starterText) {
+    throw new Error(`Starter chip index ${starterIndex} is blank on ${routeConfig.route}`);
+  }
+
+  await chip.click({ timeout: 10_000 });
+  await page.waitForTimeout(5_000);
+
+  const snapshot = await captureSnapshot(page, routeConfig, runtime, routeStartedAt);
+  const answerStatus = snapshot.answerStatuses[0] ?? null;
+
+  return {
+    starterIndex,
+    starterText,
+    answerStatus,
+    ok: answerStatus === 'grounded' && snapshot.cardTitles.length > 0,
+    snapshot,
+  };
+}
+
+function buildIssues(routeResult) {
+  const issues = [];
+  if (!routeResult.contextMatchesExpected) {
+    issues.push(`Unexpected context title on ${routeResult.route}: ${routeResult.contextTitle || '<empty>'}`);
+  }
+  if (routeResult.snapshot.degradedBanners.length > 0) {
+    issues.push(`Degraded banner visible on ${routeResult.route}: ${routeResult.snapshot.degradedBanners.join(' | ')}`);
+  }
+  if (routeResult.snapshot.starterChips.length === 0) {
+    issues.push(`No starter chips rendered on ${routeResult.route}`);
+  }
+  issues.push(...routeResult.snapshot.consoleErrors.map((entry) => `console:${entry.text}`));
+  issues.push(...routeResult.snapshot.pageErrors.map((entry) => `pageerror:${entry.message}`));
+  for (const starter of routeResult.executedStarters) {
+    if (!starter.ok) {
+      issues.push(`Starter index ${starter.starterIndex} "${starter.starterText}" did not resolve to grounded results on ${routeResult.route}`);
+    }
+  }
+
+  return issues;
+}
+
+async function main() {
+  mkdirSync(outputDirectory, { recursive: true });
+
+  const authReport = await authenticateFrontdoor({
+    statePath,
+    reportPath,
+    headless: true,
+  });
+
+  const browser = await chromium.launch({
+    headless: true,
+    args: ['--disable-dev-shm-usage'],
+  });
+
+  const context = await createAuthenticatedContext(browser, authReport, { statePath });
+  const page = await context.newPage();
+  const runtime = createRuntime();
+  attachRuntimeListeners(page, runtime);
+
+  const results = [];
+  const runtimeIssues = [];
+
+  try {
+    for (const routeConfig of routeMatrix) {
+      const routeStartedAt = Date.now();
+      await openSearch(page, routeConfig.route);
+      const snapshot = await captureSnapshot(page, routeConfig, runtime, routeStartedAt);
+      const executedStarters = [];
+
+      for (let starterIndex = 0; starterIndex < Math.min(snapshot.starterChips.length, 1); starterIndex += 1) {
+        // eslint-disable-next-line no-await-in-loop
+        executedStarters.push(await executeStarter(page, routeConfig, starterIndex, runtime));
+      }
+
+      const routeResult = {
+        key: routeConfig.key,
+        label: routeConfig.label,
+        route: routeConfig.route,
+        contextTitle: snapshot.contextTitle,
+        contextMatchesExpected: routeConfig.expectedContext.test(snapshot.contextTitle),
+        snapshot,
+        executedStarters,
+      };
+
+      results.push(routeResult);
+      runtimeIssues.push(...buildIssues(routeResult));
+    }
+  } finally {
+    const summary = {
+      checkedAtUtc: new Date().toISOString(),
+      scopeQuery,
+      routesChecked: results.length,
+      results,
+      runtimeIssueCount: runtimeIssues.length,
+      runtimeIssues,
+    };
+
+    writeFileSync(resultPath, `${JSON.stringify(summary, null, 2)}\n`, 'utf8');
+    await context.close();
+    await browser.close();
+
+    if (runtimeIssues.length > 0) {
+      throw new Error(runtimeIssues.join('; '));
+    }
+
+    process.stdout.write(`live-frontdoor-unified-search-route-matrix: ${results.length} routes checked, ${runtimeIssues.length} issues\n`);
+  }
+}
+
+main().catch((error) => {
+  process.stderr.write(`[live-frontdoor-unified-search-route-matrix] ${error instanceof Error ? error.message : String(error)}\n`);
+  process.exit(1);
+});