463 lines
26 KiB
Markdown
463 lines
26 KiB
Markdown
# AdvisoryAI Knowledge Search (AKS)
|
||
|
||
## Why retrieval-first
|
||
AKS is a deterministic retrieval system for operational problem solving across Stella Ops docs, OpenAPI contracts, and Doctor checks. It is designed to work offline and does not require GPU-backed or hosted LLM inference for correctness.
|
||
|
||
LLMs can still be used as optional formatters later, but AKS correctness is grounded in source retrieval and explicit references.
|
||
|
||
## Scope
|
||
- Module owner: `src/AdvisoryAI/**`.
|
||
- Search surfaces consuming AKS:
|
||
- Web global search in `src/Web/StellaOps.Web/**`.
|
||
- CLI commands in `src/Cli/**`.
|
||
- Doctor execution remains authoritative in Doctor module. AKS only indexes metadata and remediation references.
|
||
|
||
## Architecture
|
||
1. Ingestion/indexing:
|
||
- Markdown allow-list/manifest -> section chunks.
|
||
- OpenAPI aggregate (`openapi_current.json` style artifact) -> per-operation chunks + normalized operation tables.
|
||
- Doctor seed + controls metadata (including CLI-discovered Doctor check catalog projection) -> doctor projection chunks.
|
||
2. Storage:
|
||
- PostgreSQL tables in schema `advisoryai` via migration `src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/002_knowledge_search.sql`.
|
||
3. Retrieval:
|
||
- FTS (`tsvector` + `websearch_to_tsquery`) + optional vector stage.
|
||
- Deterministic fusion and tie-breaking in `KnowledgeSearchService`.
|
||
4. Delivery:
|
||
- API endpoint: `POST /v1/advisory-ai/search`.
|
||
- Index rebuild endpoint: `POST /v1/advisory-ai/index/rebuild`.
|
||
|
||
Unified-search architecture reference:
|
||
- `docs/modules/advisory-ai/unified-search-architecture.md`
|
||
|
||
## Data model
|
||
AKS schema tables:
|
||
- `advisoryai.kb_doc`: canonical source docs with product/version/content hash metadata.
|
||
- `advisoryai.kb_chunk`: searchable units (`md_section`, `api_operation`, `doctor_check`) with anchors, spans, `tsvector`, and embeddings.
|
||
- `advisoryai.api_spec`: raw OpenAPI snapshot (`jsonb`) by service.
|
||
- `advisoryai.api_operation`: normalized operation records (`method`, `path`, `operation_id`, tags, request/response/security json).
|
||
- `advisoryai.doctor_search_projection`: searchable doctor metadata and remediation.
|
||
|
||
Vector support:
|
||
- Tries `CREATE EXTENSION vector`.
|
||
- If unavailable, AKS remains fully functional via FTS and deterministic array embeddings fallback.
|
||
|
||
## Deterministic ingestion rules
|
||
### Markdown
|
||
- Source order:
|
||
1. Allow-list file: `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/knowledge-docs-allowlist.json`.
|
||
2. Generated manifest (optional, from CLI tool): `knowledge-docs-manifest.json`.
|
||
3. Fallback scan roots (`docs/**`) only if allow-list resolves no markdown files.
|
||
- Chunk by H2/H3 headings.
|
||
- Stable anchors using slug + duplicate suffix.
|
||
- Stable chunk IDs from source path + anchor + span.
|
||
- Metadata includes path, anchor, section path, tags.
|
||
|
||
### OpenAPI
|
||
- Source order:
|
||
1. Aggregated OpenAPI file path (default `devops/compose/openapi_current.json`).
|
||
2. Fallback repository scan for `openapi.json` when aggregate is missing.
|
||
- Parse deterministic JSON aggregate for MVP.
|
||
- Emit one searchable chunk per HTTP operation.
|
||
- Preserve structured operation payloads (`request_json`, `responses_json`, `security_json`).
|
||
|
||
### Doctor
|
||
- Source order:
|
||
1. Seed file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-seed.json`.
|
||
2. Controls file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-controls.json` (contains control fields plus fallback metadata from `stella advisoryai sources prepare`).
|
||
3. Optional Doctor endpoint metadata (`DoctorChecksEndpoint`) when configured.
|
||
- `stella advisoryai sources prepare` merges configured seed entries with `DoctorEngine.ListChecks()` (when available in CLI runtime) and writes enriched control projection metadata (`title`, `severity`, `description`, `remediation`, `runCommand`, `symptoms`, `tags`, `references`).
|
||
- Emit doctor chunk + projection record including:
|
||
- `checkCode`, `title`, `severity`, `runCommand`, remediation, symptoms.
|
||
- control metadata (`control`, `requiresConfirmation`, `isDestructive`, `inspectCommand`, `verificationCommand`).
|
||
|
||
## Ranking strategy
|
||
Implemented in `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchService.cs`:
|
||
- Candidate retrieval:
|
||
- lexical set from FTS.
|
||
- optional vector set from embedding candidates.
|
||
- Fusion:
|
||
- reciprocal rank fusion style scoring.
|
||
- Deterministic boosts:
|
||
- exact `checkCode` match.
|
||
- exact `operationId` match.
|
||
- `METHOD /path` match.
|
||
- filter-aligned service/tag boosts.
|
||
- Deterministic ordering:
|
||
- score desc -> kind asc -> chunk id asc.
|
||
|
||
## API contract
|
||
### Search
|
||
- `POST /v1/advisory-ai/search`
|
||
- Legacy notice: endpoint emits deprecation metadata and points to unified replacement `POST /v1/search/query`.
|
||
- Authorization: `advisory-ai:operate` (or `advisory-ai:admin`).
|
||
- Filter validation: `filters.type` allowlist is strictly enforced (`docs`, `api`, `doctor`); unsupported values return HTTP 400.
|
||
- Request:
|
||
- `q` (required), `k`, `filters.type|product|version|service|tags`, `includeDebug`.
|
||
- Response:
|
||
- typed results (`docs|api|doctor`) with snippet, score, and open action.
|
||
|
||
### Rebuild
|
||
- `POST /v1/advisory-ai/index/rebuild`
|
||
- Rebuilds AKS deterministically from local docs/specs/doctor metadata.
|
||
- Authorization: `advisory-ai:admin`.
|
||
|
||
## Localization runtime contract
|
||
- AdvisoryAI WebService localization is enabled through `AddStellaOpsLocalization(...)`, embedded service bundles (`Translations/*.advisoryai.json`), and `AddRemoteTranslationBundles()`.
|
||
- Locale behavior follows backend contract: `X-Locale` -> `Accept-Language` -> default locale.
|
||
- Supported service locales for this rollout slice: `en-US`, `de-DE`.
|
||
- Remote translation bundles are enabled when Platform base URL is configured via `STELLAOPS_PLATFORM_URL`, `Platform:BaseUrl`, or `StellaOps:Platform:BaseUrl`.
|
||
- Localized validation keys used by both `POST /v1/advisory-ai/search` and `POST /v1/search/query`:
|
||
- `advisoryai.validation.q_required`
|
||
- `advisoryai.validation.q_max_512`
|
||
- `advisoryai.validation.tenant_required`
|
||
|
||
## Unified search interoperability
|
||
- Unified endpoint: `POST /v1/search/query`.
|
||
- Query validation: `q` is required and capped at 512 characters.
|
||
- Tenant validation: unified and AKS search endpoints now require tenant context (`X-StellaOps-Tenant` or `X-Tenant-Id`) and bind tenant into backend search filters.
|
||
- Unified filter allowlists are enforced server-side:
|
||
- Supported `filters.domains`: `knowledge`, `findings`, `vex`, `policy`, `platform`.
|
||
- Supported `filters.entityTypes`: `docs`, `api`, `doctor`, `finding`, `vex_statement`, `policy_rule`, `platform_entity`.
|
||
- Unsupported domain/entity filter values are rejected with HTTP 400; they are not silently broadened to an unfiltered query.
|
||
- Web ambient contract:
|
||
- Global search emits ambient context with each unified query: `currentRoute`, `visibleEntityKeys`, `recentSearches`, `sessionId`, and optional `lastAction` (`action`, `source`, `queryHint`, `domain`, `entityKey`, `route`, `occurredAt`).
|
||
- Contract remains backward-compatible: if an API deployment does not yet consume `lastAction`, unknown ambient fields are ignored and base search behavior remains unchanged.
|
||
- UI suggestion behavior now combines obvious route defaults with one strategic non-obvious suggestion and action-aware variants (for example, policy/VEX impact and incident timeline pivots).
|
||
- Unified index lifecycle:
|
||
- Manual rebuild endpoint: `POST /v1/search/index/rebuild`.
|
||
- Optional background refresh loop is available via `KnowledgeSearchOptions` (`UnifiedAutoIndexEnabled`, `UnifiedAutoIndexOnStartup`, `UnifiedIndexRefreshIntervalSeconds`).
|
||
- Unified ingestion adapters for findings/vex/policy now use live upstream service payloads as primary source, with deterministic snapshot fallback only when upstream endpoints are unavailable or unconfigured.
|
||
- Live adapters: `FindingsSearchAdapter`, `VexSearchAdapter`, `PolicySearchAdapter`.
|
||
- Platform catalog remains a deterministic snapshot projection via `PlatformCatalogIngestionAdapter`.
|
||
- Default snapshot fallback paths:
|
||
- `src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/findings.snapshot.json`
|
||
- `src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/vex.snapshot.json`
|
||
- `src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/policy.snapshot.json`
|
||
- Ranking determinism:
|
||
- Freshness boost is disabled by default and only applies when `UnifiedFreshnessBoostEnabled` is explicitly enabled.
|
||
- Ranking no longer depends on ambient wall-clock time unless that option is enabled.
|
||
- Query telemetry:
|
||
- Unified search emits hashed query telemetry (`SHA-256` query hash, intent, domain weights, latency, top domains) via `IUnifiedSearchTelemetrySink`.
|
||
- Search analytics persistence stores hashed query keys (`SHA-256`, normalized) and pseudonymous user keys (tenant+user hash) in analytics/feedback artifacts.
|
||
- Free-form feedback comments are redacted at persistence time to avoid storing potential PII in analytics tables.
|
||
- Server-side search history remains user-facing functionality (raw query for history UX) and is keyed by pseudonymous user hash.
|
||
- Web fallback behavior: when unified search fails, `UnifiedSearchClient` falls back to legacy AKS (`/v1/advisory-ai/search`) and maps grouped legacy results into unified cards (`diagnostics.mode = legacy-fallback`).
|
||
- UI now shows an explicit degraded-mode banner for `legacy-fallback` / `fallback-empty` modes and clears it automatically on recovery.
|
||
- Degraded-mode enter/exit transitions emit analytics markers (`__degraded_mode_enter__`, `__degraded_mode_exit__`); server-side search history intentionally ignores `__*` synthetic markers.
|
||
- Deprecation timeline and migration milestones are tracked in `docs/modules/advisory-ai/CHANGELOG.md`.
|
||
|
||
## Unified search threat model (USRCH-POL-005)
|
||
Primary attack vectors and implemented mitigations:
|
||
- Cross-tenant data leakage:
|
||
- Risk: chunks from tenant A becoming visible in tenant B through weak filtering or identity collisions.
|
||
- Mitigations: mandatory tenant context on AKS/unified endpoints; tenant-aware store filters (`metadata.tenant` + `global` allowance); tenant-scoped chunk/doc identity for findings/vex/policy live adapters to prevent cross-tenant upsert collisions.
|
||
- Prompt/content injection from indexed sources:
|
||
- Risk: untrusted indexed text influencing synthesis or downstream operators.
|
||
- Mitigations: deterministic retrieval-first pipeline; synthesis grounding enforcement; analytics stores hashed query identifiers only; prompt payloads are not persisted in raw form.
|
||
- UI/script injection via snippets:
|
||
- Risk: malicious `<script>`/HTML in indexed body or highlighted snippets leading to XSS in search result cards.
|
||
- Mitigations: backend snippet sanitization strips script and HTML tags before response mapping; web client normalizes and strips tags again as defense-in-depth.
|
||
- Query-amplification and expensive-query DoS:
|
||
- Risk: oversized/invalid filters and high-rate query floods increasing DB and fusion cost.
|
||
- Mitigations: `q` length cap (512), strict allowlist validation for domains/entity types, per-tenant rate limiting, bounded candidate limits/timeouts in retrieval stages.
|
||
|
||
## Web behavior
|
||
Global search now consumes AKS and supports:
|
||
- Mixed grouped results (`Docs`, `API Endpoints`, `Doctor Checks`).
|
||
- Type filter chips.
|
||
- Result actions:
|
||
- Docs: `Open`.
|
||
- API: `Curl` (copy command).
|
||
- Doctor: `Run` (navigate to doctor and copy run command).
|
||
- `More` action for "show more like this" local query expansion.
|
||
- Search-quality metrics taxonomy is standardized on `query`, `click`, and `zero_result` event types (no legacy `search` event dependency in quality SQL).
|
||
- Synthesis usage is tracked via dedicated `synthesis` analytics events, while quality aggregates continue to compute totals from `query` + `zero_result`.
|
||
- Quality dashboard query dimensions are exposed as query hashes (not raw query text) for privacy-preserving analytics.
|
||
|
||
## CLI behavior
|
||
AKS commands:
|
||
- `stella search "<query>" [--type docs|api|doctor] [--product ...] [--version ...] [--service ...] [--tag ...] [--k N] [--synthesize] [--json]`
|
||
- `stella doctor suggest "<symptom>" [--product ...] [--version ...] [--k N] [--json]`
|
||
- `stella advisoryai index rebuild [--json]`
|
||
- `stella advisoryai sources prepare [--repo-root ...] [--docs-allowlist ...] [--docs-manifest-output ...] [--openapi-output ...] [--doctor-seed ...] [--doctor-controls-output ...] [--overwrite] [--json]`
|
||
- Unified-search API operations:
|
||
- `POST /v1/search/query`
|
||
- `POST /v1/search/synthesize`
|
||
- `POST /v1/search/index/rebuild`
|
||
|
||
Output:
|
||
- Human mode: grouped actionable references.
|
||
- JSON mode: stable machine-readable payload.
|
||
|
||
## Test/benchmark strategy
|
||
Implemented benchmark framework:
|
||
- Generator: `KnowledgeSearchBenchmarkDatasetGenerator` (deterministic synthetic set with explicit ground truth).
|
||
- Runner: `KnowledgeSearchBenchmarkRunner` (recall@k, p50/p95 latency, stability pass).
|
||
- Models/serialization:
|
||
- `KnowledgeSearchBenchmarkModels.cs`
|
||
- `KnowledgeSearchBenchmarkJson.cs`
|
||
|
||
Tests:
|
||
- `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchBenchmarkTests.cs`
|
||
- verifies deterministic dataset generation with >= 1000 queries.
|
||
- verifies recall/latency metrics and top-k match behavior.
|
||
|
||
Unified-search quality benchmarks:
|
||
- Corpus: `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/unified-search-quality-corpus.json` (250 graded queries).
|
||
- Runner: `UnifiedSearchQualityBenchmarkRunner`.
|
||
- Fast PR gate: `UnifiedSearchQualityBenchmarkFastSubsetTests` (50 queries).
|
||
- Full suite: `UnifiedSearchQualityBenchmarkTests` and `UnifiedSearchPerformanceEnvelopeTests`.
|
||
- Reports:
|
||
- `docs/modules/advisory-ai/unified-search-ranking-benchmark.md`
|
||
- `docs/modules/advisory-ai/unified-search-release-readiness.md`
|
||
- `docs/operations/unified-search-operations.md`
|
||
|
||
## Dedicated AKS test DB
|
||
Compose profile:
|
||
- `devops/compose/docker-compose.advisoryai-knowledge-test.yml`
|
||
|
||
Init script:
|
||
- `devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql`
|
||
|
||
Example workflow:
|
||
```bash
|
||
docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d
|
||
stella advisoryai sources prepare --json
|
||
stella advisoryai index rebuild --json
|
||
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj
|
||
```
|
||
|
||
## Search improvement sprints (G1–G10) — testing infrastructure guide
|
||
|
||
Ten search improvement sprints (SPRINT_20260224_101 through SPRINT_20260224_110) were implemented as a batch. This section documents how to set up infrastructure and run the full test suite.
|
||
|
||
### Sprint inventory
|
||
|
||
| Sprint | Gap | Topic | Module(s) |
|
||
| --- | --- | --- | --- |
|
||
| 101 | G5 | FTS English stemming + trigram fuzzy | AdvisoryAI (backend) |
|
||
| 102 | G1 | ONNX semantic vector encoder | AdvisoryAI (backend) |
|
||
| 103 | G2 | Cross-domain live-data adapters | AdvisoryAI (backend) |
|
||
| 104 | G3 | LLM-grounded synthesis engine | AdvisoryAI (backend) |
|
||
| 105 | G4 | Search onboarding + guided discovery + "Did you mean?" | FE + AdvisoryAI |
|
||
| 106 | G6 | Search personalization (popularity boost, role-based bias, history) | AdvisoryAI + FE |
|
||
| 107 | G7 | Search → Chat bridge ("Ask AI" button) | FE |
|
||
| 108 | G8 | Inline result previews (expandable entity cards) | AdvisoryAI + FE |
|
||
| 109 | G9 | Multilingual search (de/fr/es/ru FTS, language detection, localized doctor seeds) | AdvisoryAI + FE |
|
||
| 110 | G10 | Search feedback loop (thumbs up/down, quality dashboard, query refinements) | AdvisoryAI + FE |
|
||
|
||
### Test projects and files
|
||
|
||
All backend tests live in a single test project:
|
||
```
|
||
src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj
|
||
```
|
||
|
||
Key test files added by the search sprints:
|
||
|
||
| File | Coverage | Type |
|
||
| --- | --- | --- |
|
||
| `Integration/UnifiedSearchSprintIntegrationTests.cs` | All 10 sprints (87 tests) — endpoint auth, domain filtering, synthesis, suggestions, role-based bias, multilingual detection, feedback validation | Integration (WebApplicationFactory) |
|
||
| `Integration/KnowledgeSearchEndpointsIntegrationTests.cs` | AKS endpoints: auth, search, localization, rebuild | Integration (WebApplicationFactory) |
|
||
| `KnowledgeSearch/FtsRecallBenchmarkTests.cs` | G5-005: FTS recall benchmark (12 tests, 34-query fixture) | Benchmark |
|
||
| `KnowledgeSearch/FtsRecallBenchmarkStore.cs` | In-memory FTS store simulating Simple vs English modes | Test harness |
|
||
| `KnowledgeSearch/SemanticRecallBenchmarkTests.cs` | G1-004: Semantic recall benchmark (13 tests, 48-query fixture) | Benchmark |
|
||
| `KnowledgeSearch/SemanticRecallBenchmarkStore.cs` | In-memory vector store with cosine similarity search | Test harness |
|
||
| `UnifiedSearch/UnifiedSearchServiceTests.cs` | G8: Preview generation (7 tests) | Unit |
|
||
|
||
Test data fixtures (auto-copied to output via `TestData/*.json` glob in .csproj):
|
||
- `TestData/fts-recall-benchmark.json` — 34 queries across exact/stemming/typos/short/natural categories
|
||
- `TestData/semantic-recall-benchmark.json` — 48 queries across synonym/paraphrase/conceptual/acronym/exact categories
|
||
|
||
### Prerequisites to run
|
||
|
||
**Detailed infrastructure setup guide**: `src/AdvisoryAI/__Tests/INFRASTRUCTURE.md` — covers 4 tiers (in-process, live database, ONNX model, frontend E2E) with exact Docker commands, connection strings, extension requirements, and config examples.
|
||
|
||
**No external infrastructure needed for the in-process test suite.** All integration tests use `WebApplicationFactory<Program>` with stubbed services. Benchmarks use in-memory stores. No PostgreSQL, no Docker, no network access required.
|
||
|
||
Run the full suite:
|
||
```bash
|
||
dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" -v normal
|
||
```
|
||
|
||
Run only the search sprint integration tests:
|
||
```bash
|
||
dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" \
|
||
--filter "FullyQualifiedName~UnifiedSearchSprintIntegrationTests" -v normal
|
||
```
|
||
|
||
Run only the FTS recall benchmark:
|
||
```bash
|
||
dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" \
|
||
--filter "FullyQualifiedName~FtsRecallBenchmarkTests" -v normal
|
||
```
|
||
|
||
Run only the semantic recall benchmark:
|
||
```bash
|
||
dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" \
|
||
--filter "FullyQualifiedName~SemanticRecallBenchmarkTests" -v normal
|
||
```
|
||
|
||
**For live database tests** (e.g., full AKS rebuild + query against real Postgres with pg_trgm/pgvector):
|
||
```bash
|
||
# Start the dedicated AKS test database
|
||
docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d
|
||
|
||
# Wait for health check
|
||
docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml ps
|
||
|
||
# Prepare sources and rebuild index
|
||
stella advisoryai sources prepare --json
|
||
stella advisoryai index rebuild --json
|
||
|
||
# Run tests with the Live category (requires database)
|
||
dotnet test "src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj" \
|
||
--filter "Category=Live" -v normal
|
||
```
|
||
|
||
Or use the full CI testing stack:
|
||
```bash
|
||
docker compose -f devops/compose/docker-compose.testing.yml --profile ci up -d
|
||
```
|
||
|
||
### Database extensions required for live tests
|
||
|
||
The AKS knowledge test database init script (`devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql`) must enable:
|
||
- `vector` (pgvector) — for `embedding_vec vector(384)` columns and cosine similarity
|
||
- `pg_trgm` — for trigram fuzzy matching (`similarity()`, GIN trigram indexes)
|
||
|
||
These are already configured in the compose init scripts. If setting up a custom test database:
|
||
```sql
|
||
CREATE EXTENSION IF NOT EXISTS vector;
|
||
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||
```
|
||
|
||
### Migrations required for search sprints
|
||
|
||
The search sprints added several migrations under `src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/`:
|
||
|
||
| Migration | Sprint | Content |
|
||
| --- | --- | --- |
|
||
| `004_fts_english_config.sql` | G5 (101) | `body_tsv_en` tsvector column + GIN index, pg_trgm extension + trigram indexes |
|
||
| `005_search_feedback.sql` | G10 (110) | `search_feedback` + `search_quality_alerts` tables |
|
||
| `005_search_analytics.sql` | G6 (106) | `search_events` + `search_history` tables |
|
||
| `007_multilingual_fts.sql` | G9 (109) | `body_tsv_de`, `body_tsv_fr`, `body_tsv_es`, `body_tsv_ru` tsvector columns + GIN indexes |
|
||
|
||
All migrations are idempotent (IF NOT EXISTS guards). They run automatically via `EnsureSchemaAsync()` at service startup.
|
||
|
||
### Frontend tests
|
||
|
||
Frontend changes span `src/Web/StellaOps.Web/`. To run Angular unit tests:
|
||
```bash
|
||
cd src/Web/StellaOps.Web
|
||
npm install
|
||
npm run test:ci
|
||
```
|
||
|
||
For E2E tests (requires the full stack running):
|
||
```bash
|
||
cd src/Web/StellaOps.Web
|
||
npx playwright install
|
||
npm run test:e2e
|
||
```
|
||
|
||
Relevant E2E config: `src/Web/StellaOps.Web/playwright.e2e.config.ts`.
|
||
|
||
### InternalsVisibleTo
|
||
|
||
The production assembly `StellaOps.AdvisoryAI` grants `InternalsVisibleTo` to `StellaOps.AdvisoryAI.Tests` (see `src/AdvisoryAI/StellaOps.AdvisoryAI/Properties/AssemblyInfo.cs`). This allows tests to access `internal` types including:
|
||
- `IVectorEncoder`, `DeterministicHashVectorEncoder`, `OnnxVectorEncoder`
|
||
- `ISynthesisEngine`, `SynthesisTemplateEngine`, `CompositeSynthesisEngine`, `LlmSynthesisEngine`
|
||
- `IntentClassifier`, `QueryLanguageDetector`, `MultilingualIntentKeywords`, `DomainWeightCalculator`
|
||
- `SearchAnalyticsService`, `SearchQualityMonitor`
|
||
- `WeightedRrfFusion`, `UnifiedSearchService`
|
||
- `IKnowledgeSearchStore`, `KnowledgeChunkRow`
|
||
|
||
### Key interfaces to stub in integration tests
|
||
|
||
| Interface | Purpose | Typical stub behavior |
|
||
| --- | --- | --- |
|
||
| `IKnowledgeSearchService` | AKS search | Return hardcoded results per query |
|
||
| `IKnowledgeIndexer` | AKS index rebuild | Return fixed summary counts |
|
||
| `IUnifiedSearchService` | Unified search | Return entity cards with domain filtering |
|
||
| `IUnifiedSearchIndexer` | Unified index rebuild | Return fixed summary |
|
||
| `ISynthesisEngine` | AI synthesis | Return template-based synthesis |
|
||
| `IVectorEncoder` | Embedding generation | Use `DeterministicHashVectorEncoder` or `EmptyVectorEncoder` |
|
||
| `IKnowledgeSearchStore` | FTS/vector storage | Use `DeterministicBenchmarkStore` or `FtsRecallBenchmarkStore` |
|
||
|
||
### Test categories and filtering
|
||
|
||
Use `[Trait("Category", TestCategories.XXX)]` to categorize tests. Key categories:
|
||
- `Unit` — fast, in-memory, no external deps (default for most tests)
|
||
- `Integration` — uses `WebApplicationFactory` or test containers
|
||
- `Performance` — benchmarks (FTS recall, semantic recall)
|
||
- `Live` — requires running database (skip in standard CI)
|
||
|
||
Filter examples:
|
||
```bash
|
||
# All except Live
|
||
dotnet test ... --filter "Category!=Live"
|
||
|
||
# Only integration
|
||
dotnet test ... --filter "Category=Integration"
|
||
|
||
# Specific test class
|
||
dotnet test ... --filter "FullyQualifiedName~FtsRecallBenchmarkTests"
|
||
```
|
||
|
||
### Localized doctor seeds
|
||
|
||
Doctor check content is available in 3 locales:
|
||
- `doctor-search-seed.json` — English (base, 8 checks)
|
||
- `doctor-search-seed.de.json` — German (de-DE)
|
||
- `doctor-search-seed.fr.json` — French (fr-FR)
|
||
|
||
The `KnowledgeIndexer.IngestDoctorAsync()` method auto-discovers locale files via `DoctorSearchSeedLoader.LoadLocalized()` and ingests locale-tagged chunks alongside English. This enables German/French FTS queries to match doctor check content.
|
||
|
||
### Configuration options added by search sprints
|
||
|
||
All in `KnowledgeSearchOptions` (`src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchOptions.cs`):
|
||
|
||
| Option | Default | Sprint | Purpose |
|
||
| --- | --- | --- | --- |
|
||
| `FtsLanguageConfig` | `"english"` | G5 | Primary FTS text search config |
|
||
| `FuzzyFallbackEnabled` | `true` | G5 | Enable pg_trgm fuzzy fallback |
|
||
| `MinFtsResultsForFuzzyFallback` | `3` | G5 | Threshold for fuzzy activation |
|
||
| `FuzzySimilarityThreshold` | `0.3` | G5 | pg_trgm similarity cutoff |
|
||
| `VectorEncoderType` | `"hash"` | G1 | `"hash"` or `"onnx"` |
|
||
| `OnnxModelPath` | `"models/all-MiniLM-L6-v2.onnx"` | G1 | Path to ONNX model file |
|
||
| `LlmSynthesisEnabled` | `false` | G3 | Enable LLM-grounded synthesis |
|
||
| `SynthesisTimeoutMs` | `5000` | G3 | LLM synthesis timeout |
|
||
| `LlmAdapterBaseUrl` | `null` | G3 | LLM adapter service URL |
|
||
| `LlmProviderId` | `null` | G3 | LLM provider selection |
|
||
| `PopularityBoostEnabled` | `false` | G6 | Enable click-weighted ranking |
|
||
| `PopularityBoostWeight` | `0.05` | G6 | Popularity boost factor |
|
||
| `RoleBasedBiasEnabled` | `true` | G6 | Enable scope-based domain weighting |
|
||
| `SearchQualityMonitorEnabled` | `true` | G10 | Enable periodic quality-alert refresh |
|
||
| `SearchQualityMonitorIntervalSeconds` | `300` | G10 | Quality-alert refresh cadence |
|
||
| `SearchAnalyticsRetentionEnabled` | `true` | G10 | Enable automatic analytics/feedback/history pruning |
|
||
| `SearchAnalyticsRetentionDays` | `90` | G10 | Retention window for search analytics artifacts |
|
||
| `SearchAnalyticsRetentionIntervalSeconds` | `3600` | G10 | Retention pruning cadence |
|
||
| `FtsLanguageConfigs` | `{}` | G9 | Per-locale FTS config map |
|
||
|
||
Unified-search options (`UnifiedSearchOptions`, `src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/UnifiedSearchOptions.cs`):
|
||
- `Enabled`
|
||
- `BaseDomainWeights`
|
||
- `Weighting.*`
|
||
- `Federation.*`
|
||
- `GravityBoost.*`
|
||
- `Synthesis.*`
|
||
- `Ingestion.*`
|
||
- `Session.*`
|
||
- `TenantFeatureFlags.<tenant>.Enabled`
|
||
- `TenantFeatureFlags.<tenant>.FederationEnabled`
|
||
- `TenantFeatureFlags.<tenant>.SynthesisEnabled`
|
||
|
||
## Known limitations and follow-ups
|
||
- YAML OpenAPI ingestion is not included in MVP.
|
||
- End-to-end benchmark against live Postgres-backed AKS service is planned as a follow-up CI lane.
|
||
- Optional external embedding providers can be added later without changing API contracts.
|
||
- ONNX model file (`all-MiniLM-L6-v2.onnx`, ~80MB) must be provisioned separately for deployments opting into `VectorEncoderType=onnx`. Air-gap bundles must include the model.
|
||
- Doctor seed localization covers de-DE and fr-FR only. Other locales (es-ES, ru-RU, bg-BG, etc.) use English fallback.
|
||
- Search quality dashboard deferred items: low-quality results table, top queries table, 30-day trend chart (require additional backend aggregation queries).
|