Files

master e746577380 wip: doctor/cli/docs/api to vector db consolidation; api hardening for descriptions, tenant, and scopes; migrations and conversions of all DALs to EF v10

2026-02-23 15:30:50 +02:00

7.0 KiB

Raw Blame History

AdvisoryAI Knowledge Search (AKS)

Why retrieval-first

AKS is a deterministic retrieval system for operational problem solving across Stella Ops docs, OpenAPI contracts, and Doctor checks. It is designed to work offline and does not require GPU-backed or hosted LLM inference for correctness.

LLMs can still be used as optional formatters later, but AKS correctness is grounded in source retrieval and explicit references.

Scope

Module owner: src/AdvisoryAI/**.
Search surfaces consuming AKS:
- Web global search in src/Web/StellaOps.Web/**.
- CLI commands in src/Cli/**.
Doctor execution remains authoritative in Doctor module. AKS only indexes metadata and remediation references.

Architecture

Ingestion/indexing:
- Markdown allow-list/manifest -> section chunks.
- OpenAPI aggregate (openapi_current.json style artifact) -> per-operation chunks + normalized operation tables.
- Doctor seed + controls metadata (including CLI-discovered Doctor check catalog projection) -> doctor projection chunks.
Storage:
- PostgreSQL tables in schema advisoryai via migration src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/002_knowledge_search.sql.
Retrieval:
- FTS (tsvector + websearch_to_tsquery) + optional vector stage.
- Deterministic fusion and tie-breaking in KnowledgeSearchService.
Delivery:
- API endpoint: POST /v1/advisory-ai/search.
- Index rebuild endpoint: POST /v1/advisory-ai/index/rebuild.

Data model

AKS schema tables:

advisoryai.kb_doc: canonical source docs with product/version/content hash metadata.
advisoryai.kb_chunk: searchable units (md_section, api_operation, doctor_check) with anchors, spans, tsvector, and embeddings.
advisoryai.api_spec: raw OpenAPI snapshot (jsonb) by service.
advisoryai.api_operation: normalized operation records (method, path, operation_id, tags, request/response/security json).
advisoryai.doctor_search_projection: searchable doctor metadata and remediation.

Vector support:

Tries CREATE EXTENSION vector.
If unavailable, AKS remains fully functional via FTS and deterministic array embeddings fallback.

Deterministic ingestion rules

Markdown

Source order:
1. Allow-list file: src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/knowledge-docs-allowlist.json.
2. Generated manifest (optional, from CLI tool): knowledge-docs-manifest.json.
3. Fallback scan roots (docs/**) only if allow-list resolves no markdown files.
Chunk by H2/H3 headings.
Stable anchors using slug + duplicate suffix.
Stable chunk IDs from source path + anchor + span.
Metadata includes path, anchor, section path, tags.

OpenAPI

Source order:
1. Aggregated OpenAPI file path (default devops/compose/openapi_current.json).
2. Fallback repository scan for openapi.json when aggregate is missing.
Parse deterministic JSON aggregate for MVP.
Emit one searchable chunk per HTTP operation.
Preserve structured operation payloads (request_json, responses_json, security_json).

Doctor

Source order:
1. Seed file src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-seed.json.
2. Controls file src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-controls.json (contains control fields plus fallback metadata from stella advisoryai sources prepare).
3. Optional Doctor endpoint metadata (DoctorChecksEndpoint) when configured.
stella advisoryai sources prepare merges configured seed entries with DoctorEngine.ListChecks() (when available in CLI runtime) and writes enriched control projection metadata (title, severity, description, remediation, runCommand, symptoms, tags, references).
Emit doctor chunk + projection record including:
- checkCode, title, severity, runCommand, remediation, symptoms.
- control metadata (control, requiresConfirmation, isDestructive, inspectCommand, verificationCommand).

Ranking strategy

Implemented in src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchService.cs:

Candidate retrieval:
- lexical set from FTS.
- optional vector set from embedding candidates.
Fusion:
- reciprocal rank fusion style scoring.
Deterministic boosts:
- exact checkCode match.
- exact operationId match.
- METHOD /path match.
- filter-aligned service/tag boosts.
Deterministic ordering:
- score desc -> kind asc -> chunk id asc.

API contract

Search

POST /v1/advisory-ai/search
Request:
- q (required), k, filters.type|product|version|service|tags, includeDebug.
Response:
- typed results (docs|api|doctor) with snippet, score, and open action.

Rebuild

POST /v1/advisory-ai/index/rebuild
Rebuilds AKS deterministically from local docs/specs/doctor metadata.

Web behavior

Global search now consumes AKS and supports:

Mixed grouped results (Docs, API Endpoints, Doctor Checks).
Type filter chips.
Result actions:
- Docs: Open.
- API: Curl (copy command).
- Doctor: Run (navigate to doctor and copy run command).
More action for "show more like this" local query expansion.

CLI behavior

AKS commands:

stella search "<query>" [--type docs|api|doctor] [--product ...] [--version ...] [--service ...] [--tag ...] [--k N] [--json]
stella doctor suggest "<symptom>" [--product ...] [--version ...] [--k N] [--json]
stella advisoryai index rebuild [--json]
stella advisoryai sources prepare [--repo-root ...] [--docs-allowlist ...] [--docs-manifest-output ...] [--openapi-output ...] [--doctor-seed ...] [--doctor-controls-output ...] [--overwrite] [--json]

Output:

Human mode: grouped actionable references.
JSON mode: stable machine-readable payload.

Test/benchmark strategy

Implemented benchmark framework:

Generator: KnowledgeSearchBenchmarkDatasetGenerator (deterministic synthetic set with explicit ground truth).
Runner: KnowledgeSearchBenchmarkRunner (recall@k, p50/p95 latency, stability pass).
Models/serialization:
- KnowledgeSearchBenchmarkModels.cs
- KnowledgeSearchBenchmarkJson.cs

Tests:

src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchBenchmarkTests.cs
- verifies deterministic dataset generation with >= 1000 queries.
- verifies recall/latency metrics and top-k match behavior.

Dedicated AKS test DB

Compose profile:

devops/compose/docker-compose.advisoryai-knowledge-test.yml

Init script:

devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql

Example workflow:

docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d
stella advisoryai sources prepare --json
stella advisoryai index rebuild --json
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj

Known limitations and follow-ups

YAML OpenAPI ingestion is not included in MVP.
End-to-end benchmark against live Postgres-backed AKS service is planned as a follow-up CI lane.
Optional external embedding providers can be added later without changing API contracts.

7.0 KiB Raw Blame History