Files
git.stella-ops.org/docs/modules/advisory-ai/knowledge-search.md

7.0 KiB

AdvisoryAI Knowledge Search (AKS)

Why retrieval-first

AKS is a deterministic retrieval system for operational problem solving across Stella Ops docs, OpenAPI contracts, and Doctor checks. It is designed to work offline and does not require GPU-backed or hosted LLM inference for correctness.

LLMs can still be used as optional formatters later, but AKS correctness is grounded in source retrieval and explicit references.

Scope

  • Module owner: src/AdvisoryAI/**.
  • Search surfaces consuming AKS:
    • Web global search in src/Web/StellaOps.Web/**.
    • CLI commands in src/Cli/**.
  • Doctor execution remains authoritative in Doctor module. AKS only indexes metadata and remediation references.

Architecture

  1. Ingestion/indexing:
    • Markdown allow-list/manifest -> section chunks.
    • OpenAPI aggregate (openapi_current.json style artifact) -> per-operation chunks + normalized operation tables.
    • Doctor seed + controls metadata (including CLI-discovered Doctor check catalog projection) -> doctor projection chunks.
  2. Storage:
    • PostgreSQL tables in schema advisoryai via migration src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/002_knowledge_search.sql.
  3. Retrieval:
    • FTS (tsvector + websearch_to_tsquery) + optional vector stage.
    • Deterministic fusion and tie-breaking in KnowledgeSearchService.
  4. Delivery:
    • API endpoint: POST /v1/advisory-ai/search.
    • Index rebuild endpoint: POST /v1/advisory-ai/index/rebuild.

Data model

AKS schema tables:

  • advisoryai.kb_doc: canonical source docs with product/version/content hash metadata.
  • advisoryai.kb_chunk: searchable units (md_section, api_operation, doctor_check) with anchors, spans, tsvector, and embeddings.
  • advisoryai.api_spec: raw OpenAPI snapshot (jsonb) by service.
  • advisoryai.api_operation: normalized operation records (method, path, operation_id, tags, request/response/security json).
  • advisoryai.doctor_search_projection: searchable doctor metadata and remediation.

Vector support:

  • Tries CREATE EXTENSION vector.
  • If unavailable, AKS remains fully functional via FTS and deterministic array embeddings fallback.

Deterministic ingestion rules

Markdown

  • Source order:
    1. Allow-list file: src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/knowledge-docs-allowlist.json.
    2. Generated manifest (optional, from CLI tool): knowledge-docs-manifest.json.
    3. Fallback scan roots (docs/**) only if allow-list resolves no markdown files.
  • Chunk by H2/H3 headings.
  • Stable anchors using slug + duplicate suffix.
  • Stable chunk IDs from source path + anchor + span.
  • Metadata includes path, anchor, section path, tags.

OpenAPI

  • Source order:
    1. Aggregated OpenAPI file path (default devops/compose/openapi_current.json).
    2. Fallback repository scan for openapi.json when aggregate is missing.
  • Parse deterministic JSON aggregate for MVP.
  • Emit one searchable chunk per HTTP operation.
  • Preserve structured operation payloads (request_json, responses_json, security_json).

Doctor

  • Source order:
    1. Seed file src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-seed.json.
    2. Controls file src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-controls.json (contains control fields plus fallback metadata from stella advisoryai sources prepare).
    3. Optional Doctor endpoint metadata (DoctorChecksEndpoint) when configured.
  • stella advisoryai sources prepare merges configured seed entries with DoctorEngine.ListChecks() (when available in CLI runtime) and writes enriched control projection metadata (title, severity, description, remediation, runCommand, symptoms, tags, references).
  • Emit doctor chunk + projection record including:
    • checkCode, title, severity, runCommand, remediation, symptoms.
    • control metadata (control, requiresConfirmation, isDestructive, inspectCommand, verificationCommand).

Ranking strategy

Implemented in src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchService.cs:

  • Candidate retrieval:
    • lexical set from FTS.
    • optional vector set from embedding candidates.
  • Fusion:
    • reciprocal rank fusion style scoring.
  • Deterministic boosts:
    • exact checkCode match.
    • exact operationId match.
    • METHOD /path match.
    • filter-aligned service/tag boosts.
  • Deterministic ordering:
    • score desc -> kind asc -> chunk id asc.

API contract

  • POST /v1/advisory-ai/search
  • Request:
    • q (required), k, filters.type|product|version|service|tags, includeDebug.
  • Response:
    • typed results (docs|api|doctor) with snippet, score, and open action.

Rebuild

  • POST /v1/advisory-ai/index/rebuild
  • Rebuilds AKS deterministically from local docs/specs/doctor metadata.

Web behavior

Global search now consumes AKS and supports:

  • Mixed grouped results (Docs, API Endpoints, Doctor Checks).
  • Type filter chips.
  • Result actions:
    • Docs: Open.
    • API: Curl (copy command).
    • Doctor: Run (navigate to doctor and copy run command).
  • More action for "show more like this" local query expansion.

CLI behavior

AKS commands:

  • stella search "<query>" [--type docs|api|doctor] [--product ...] [--version ...] [--service ...] [--tag ...] [--k N] [--json]
  • stella doctor suggest "<symptom>" [--product ...] [--version ...] [--k N] [--json]
  • stella advisoryai index rebuild [--json]
  • stella advisoryai sources prepare [--repo-root ...] [--docs-allowlist ...] [--docs-manifest-output ...] [--openapi-output ...] [--doctor-seed ...] [--doctor-controls-output ...] [--overwrite] [--json]

Output:

  • Human mode: grouped actionable references.
  • JSON mode: stable machine-readable payload.

Test/benchmark strategy

Implemented benchmark framework:

  • Generator: KnowledgeSearchBenchmarkDatasetGenerator (deterministic synthetic set with explicit ground truth).
  • Runner: KnowledgeSearchBenchmarkRunner (recall@k, p50/p95 latency, stability pass).
  • Models/serialization:
    • KnowledgeSearchBenchmarkModels.cs
    • KnowledgeSearchBenchmarkJson.cs

Tests:

  • src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchBenchmarkTests.cs
    • verifies deterministic dataset generation with >= 1000 queries.
    • verifies recall/latency metrics and top-k match behavior.

Dedicated AKS test DB

Compose profile:

  • devops/compose/docker-compose.advisoryai-knowledge-test.yml

Init script:

  • devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql

Example workflow:

docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d
stella advisoryai sources prepare --json
stella advisoryai index rebuild --json
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj

Known limitations and follow-ups

  • YAML OpenAPI ingestion is not included in MVP.
  • End-to-end benchmark against live Postgres-backed AKS service is planned as a follow-up CI lane.
  • Optional external embedding providers can be added later without changing API contracts.