# AdvisoryAI Knowledge Search (AKS) ## Why retrieval-first AKS is a deterministic retrieval system for operational problem solving across Stella Ops docs, OpenAPI contracts, and Doctor checks. It is designed to work offline and does not require GPU-backed or hosted LLM inference for correctness. LLMs can still be used as optional formatters later, but AKS correctness is grounded in source retrieval and explicit references. ## Scope - Module owner: `src/AdvisoryAI/**`. - Search surfaces consuming AKS: - Web global search in `src/Web/StellaOps.Web/**`. - CLI commands in `src/Cli/**`. - Doctor execution remains authoritative in Doctor module. AKS only indexes metadata and remediation references. ## Architecture 1. Ingestion/indexing: - Markdown allow-list/manifest -> section chunks. - OpenAPI aggregate (`openapi_current.json` style artifact) -> per-operation chunks + normalized operation tables. - Doctor seed + controls metadata (including CLI-discovered Doctor check catalog projection) -> doctor projection chunks. 2. Storage: - PostgreSQL tables in schema `advisoryai` via migration `src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/002_knowledge_search.sql`. 3. Retrieval: - FTS (`tsvector` + `websearch_to_tsquery`) + optional vector stage. - Deterministic fusion and tie-breaking in `KnowledgeSearchService`. 4. Delivery: - API endpoint: `POST /v1/advisory-ai/search`. - Index rebuild endpoint: `POST /v1/advisory-ai/index/rebuild`. ## Data model AKS schema tables: - `advisoryai.kb_doc`: canonical source docs with product/version/content hash metadata. - `advisoryai.kb_chunk`: searchable units (`md_section`, `api_operation`, `doctor_check`) with anchors, spans, `tsvector`, and embeddings. - `advisoryai.api_spec`: raw OpenAPI snapshot (`jsonb`) by service. - `advisoryai.api_operation`: normalized operation records (`method`, `path`, `operation_id`, tags, request/response/security json). - `advisoryai.doctor_search_projection`: searchable doctor metadata and remediation. Vector support: - Tries `CREATE EXTENSION vector`. - If unavailable, AKS remains fully functional via FTS and deterministic array embeddings fallback. ## Deterministic ingestion rules ### Markdown - Source order: 1. Allow-list file: `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/knowledge-docs-allowlist.json`. 2. Generated manifest (optional, from CLI tool): `knowledge-docs-manifest.json`. 3. Fallback scan roots (`docs/**`) only if allow-list resolves no markdown files. - Chunk by H2/H3 headings. - Stable anchors using slug + duplicate suffix. - Stable chunk IDs from source path + anchor + span. - Metadata includes path, anchor, section path, tags. ### OpenAPI - Source order: 1. Aggregated OpenAPI file path (default `devops/compose/openapi_current.json`). 2. Fallback repository scan for `openapi.json` when aggregate is missing. - Parse deterministic JSON aggregate for MVP. - Emit one searchable chunk per HTTP operation. - Preserve structured operation payloads (`request_json`, `responses_json`, `security_json`). ### Doctor - Source order: 1. Seed file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-seed.json`. 2. Controls file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-controls.json` (contains control fields plus fallback metadata from `stella advisoryai sources prepare`). 3. Optional Doctor endpoint metadata (`DoctorChecksEndpoint`) when configured. - `stella advisoryai sources prepare` merges configured seed entries with `DoctorEngine.ListChecks()` (when available in CLI runtime) and writes enriched control projection metadata (`title`, `severity`, `description`, `remediation`, `runCommand`, `symptoms`, `tags`, `references`). - Emit doctor chunk + projection record including: - `checkCode`, `title`, `severity`, `runCommand`, remediation, symptoms. - control metadata (`control`, `requiresConfirmation`, `isDestructive`, `inspectCommand`, `verificationCommand`). ## Ranking strategy Implemented in `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchService.cs`: - Candidate retrieval: - lexical set from FTS. - optional vector set from embedding candidates. - Fusion: - reciprocal rank fusion style scoring. - Deterministic boosts: - exact `checkCode` match. - exact `operationId` match. - `METHOD /path` match. - filter-aligned service/tag boosts. - Deterministic ordering: - score desc -> kind asc -> chunk id asc. ## API contract ### Search - `POST /v1/advisory-ai/search` - Request: - `q` (required), `k`, `filters.type|product|version|service|tags`, `includeDebug`. - Response: - typed results (`docs|api|doctor`) with snippet, score, and open action. ### Rebuild - `POST /v1/advisory-ai/index/rebuild` - Rebuilds AKS deterministically from local docs/specs/doctor metadata. ## Web behavior Global search now consumes AKS and supports: - Mixed grouped results (`Docs`, `API Endpoints`, `Doctor Checks`). - Type filter chips. - Result actions: - Docs: `Open`. - API: `Curl` (copy command). - Doctor: `Run` (navigate to doctor and copy run command). - `More` action for "show more like this" local query expansion. ## CLI behavior AKS commands: - `stella search "" [--type docs|api|doctor] [--product ...] [--version ...] [--service ...] [--tag ...] [--k N] [--json]` - `stella doctor suggest "" [--product ...] [--version ...] [--k N] [--json]` - `stella advisoryai index rebuild [--json]` - `stella advisoryai sources prepare [--repo-root ...] [--docs-allowlist ...] [--docs-manifest-output ...] [--openapi-output ...] [--doctor-seed ...] [--doctor-controls-output ...] [--overwrite] [--json]` Output: - Human mode: grouped actionable references. - JSON mode: stable machine-readable payload. ## Test/benchmark strategy Implemented benchmark framework: - Generator: `KnowledgeSearchBenchmarkDatasetGenerator` (deterministic synthetic set with explicit ground truth). - Runner: `KnowledgeSearchBenchmarkRunner` (recall@k, p50/p95 latency, stability pass). - Models/serialization: - `KnowledgeSearchBenchmarkModels.cs` - `KnowledgeSearchBenchmarkJson.cs` Tests: - `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchBenchmarkTests.cs` - verifies deterministic dataset generation with >= 1000 queries. - verifies recall/latency metrics and top-k match behavior. ## Dedicated AKS test DB Compose profile: - `devops/compose/docker-compose.advisoryai-knowledge-test.yml` Init script: - `devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql` Example workflow: ```bash docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d stella advisoryai sources prepare --json stella advisoryai index rebuild --json dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj ``` ## Known limitations and follow-ups - YAML OpenAPI ingestion is not included in MVP. - End-to-end benchmark against live Postgres-backed AKS service is planned as a follow-up CI lane. - Optional external embedding providers can be added later without changing API contracts.