Files
git.stella-ops.org/docs/modules/advisory-ai/knowledge-search.md

150 lines
7.0 KiB
Markdown

# AdvisoryAI Knowledge Search (AKS)
## Why retrieval-first
AKS is a deterministic retrieval system for operational problem solving across Stella Ops docs, OpenAPI contracts, and Doctor checks. It is designed to work offline and does not require GPU-backed or hosted LLM inference for correctness.
LLMs can still be used as optional formatters later, but AKS correctness is grounded in source retrieval and explicit references.
## Scope
- Module owner: `src/AdvisoryAI/**`.
- Search surfaces consuming AKS:
- Web global search in `src/Web/StellaOps.Web/**`.
- CLI commands in `src/Cli/**`.
- Doctor execution remains authoritative in Doctor module. AKS only indexes metadata and remediation references.
## Architecture
1. Ingestion/indexing:
- Markdown allow-list/manifest -> section chunks.
- OpenAPI aggregate (`openapi_current.json` style artifact) -> per-operation chunks + normalized operation tables.
- Doctor seed + controls metadata (including CLI-discovered Doctor check catalog projection) -> doctor projection chunks.
2. Storage:
- PostgreSQL tables in schema `advisoryai` via migration `src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/002_knowledge_search.sql`.
3. Retrieval:
- FTS (`tsvector` + `websearch_to_tsquery`) + optional vector stage.
- Deterministic fusion and tie-breaking in `KnowledgeSearchService`.
4. Delivery:
- API endpoint: `POST /v1/advisory-ai/search`.
- Index rebuild endpoint: `POST /v1/advisory-ai/index/rebuild`.
## Data model
AKS schema tables:
- `advisoryai.kb_doc`: canonical source docs with product/version/content hash metadata.
- `advisoryai.kb_chunk`: searchable units (`md_section`, `api_operation`, `doctor_check`) with anchors, spans, `tsvector`, and embeddings.
- `advisoryai.api_spec`: raw OpenAPI snapshot (`jsonb`) by service.
- `advisoryai.api_operation`: normalized operation records (`method`, `path`, `operation_id`, tags, request/response/security json).
- `advisoryai.doctor_search_projection`: searchable doctor metadata and remediation.
Vector support:
- Tries `CREATE EXTENSION vector`.
- If unavailable, AKS remains fully functional via FTS and deterministic array embeddings fallback.
## Deterministic ingestion rules
### Markdown
- Source order:
1. Allow-list file: `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/knowledge-docs-allowlist.json`.
2. Generated manifest (optional, from CLI tool): `knowledge-docs-manifest.json`.
3. Fallback scan roots (`docs/**`) only if allow-list resolves no markdown files.
- Chunk by H2/H3 headings.
- Stable anchors using slug + duplicate suffix.
- Stable chunk IDs from source path + anchor + span.
- Metadata includes path, anchor, section path, tags.
### OpenAPI
- Source order:
1. Aggregated OpenAPI file path (default `devops/compose/openapi_current.json`).
2. Fallback repository scan for `openapi.json` when aggregate is missing.
- Parse deterministic JSON aggregate for MVP.
- Emit one searchable chunk per HTTP operation.
- Preserve structured operation payloads (`request_json`, `responses_json`, `security_json`).
### Doctor
- Source order:
1. Seed file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-seed.json`.
2. Controls file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-controls.json` (contains control fields plus fallback metadata from `stella advisoryai sources prepare`).
3. Optional Doctor endpoint metadata (`DoctorChecksEndpoint`) when configured.
- `stella advisoryai sources prepare` merges configured seed entries with `DoctorEngine.ListChecks()` (when available in CLI runtime) and writes enriched control projection metadata (`title`, `severity`, `description`, `remediation`, `runCommand`, `symptoms`, `tags`, `references`).
- Emit doctor chunk + projection record including:
- `checkCode`, `title`, `severity`, `runCommand`, remediation, symptoms.
- control metadata (`control`, `requiresConfirmation`, `isDestructive`, `inspectCommand`, `verificationCommand`).
## Ranking strategy
Implemented in `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchService.cs`:
- Candidate retrieval:
- lexical set from FTS.
- optional vector set from embedding candidates.
- Fusion:
- reciprocal rank fusion style scoring.
- Deterministic boosts:
- exact `checkCode` match.
- exact `operationId` match.
- `METHOD /path` match.
- filter-aligned service/tag boosts.
- Deterministic ordering:
- score desc -> kind asc -> chunk id asc.
## API contract
### Search
- `POST /v1/advisory-ai/search`
- Request:
- `q` (required), `k`, `filters.type|product|version|service|tags`, `includeDebug`.
- Response:
- typed results (`docs|api|doctor`) with snippet, score, and open action.
### Rebuild
- `POST /v1/advisory-ai/index/rebuild`
- Rebuilds AKS deterministically from local docs/specs/doctor metadata.
## Web behavior
Global search now consumes AKS and supports:
- Mixed grouped results (`Docs`, `API Endpoints`, `Doctor Checks`).
- Type filter chips.
- Result actions:
- Docs: `Open`.
- API: `Curl` (copy command).
- Doctor: `Run` (navigate to doctor and copy run command).
- `More` action for "show more like this" local query expansion.
## CLI behavior
AKS commands:
- `stella search "<query>" [--type docs|api|doctor] [--product ...] [--version ...] [--service ...] [--tag ...] [--k N] [--json]`
- `stella doctor suggest "<symptom>" [--product ...] [--version ...] [--k N] [--json]`
- `stella advisoryai index rebuild [--json]`
- `stella advisoryai sources prepare [--repo-root ...] [--docs-allowlist ...] [--docs-manifest-output ...] [--openapi-output ...] [--doctor-seed ...] [--doctor-controls-output ...] [--overwrite] [--json]`
Output:
- Human mode: grouped actionable references.
- JSON mode: stable machine-readable payload.
## Test/benchmark strategy
Implemented benchmark framework:
- Generator: `KnowledgeSearchBenchmarkDatasetGenerator` (deterministic synthetic set with explicit ground truth).
- Runner: `KnowledgeSearchBenchmarkRunner` (recall@k, p50/p95 latency, stability pass).
- Models/serialization:
- `KnowledgeSearchBenchmarkModels.cs`
- `KnowledgeSearchBenchmarkJson.cs`
Tests:
- `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchBenchmarkTests.cs`
- verifies deterministic dataset generation with >= 1000 queries.
- verifies recall/latency metrics and top-k match behavior.
## Dedicated AKS test DB
Compose profile:
- `devops/compose/docker-compose.advisoryai-knowledge-test.yml`
Init script:
- `devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql`
Example workflow:
```bash
docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d
stella advisoryai sources prepare --json
stella advisoryai index rebuild --json
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj
```
## Known limitations and follow-ups
- YAML OpenAPI ingestion is not included in MVP.
- End-to-end benchmark against live Postgres-backed AKS service is planned as a follow-up CI lane.
- Optional external embedding providers can be added later without changing API contracts.