150 lines
7.0 KiB
Markdown
150 lines
7.0 KiB
Markdown
# AdvisoryAI Knowledge Search (AKS)
|
|
|
|
## Why retrieval-first
|
|
AKS is a deterministic retrieval system for operational problem solving across Stella Ops docs, OpenAPI contracts, and Doctor checks. It is designed to work offline and does not require GPU-backed or hosted LLM inference for correctness.
|
|
|
|
LLMs can still be used as optional formatters later, but AKS correctness is grounded in source retrieval and explicit references.
|
|
|
|
## Scope
|
|
- Module owner: `src/AdvisoryAI/**`.
|
|
- Search surfaces consuming AKS:
|
|
- Web global search in `src/Web/StellaOps.Web/**`.
|
|
- CLI commands in `src/Cli/**`.
|
|
- Doctor execution remains authoritative in Doctor module. AKS only indexes metadata and remediation references.
|
|
|
|
## Architecture
|
|
1. Ingestion/indexing:
|
|
- Markdown allow-list/manifest -> section chunks.
|
|
- OpenAPI aggregate (`openapi_current.json` style artifact) -> per-operation chunks + normalized operation tables.
|
|
- Doctor seed + controls metadata (including CLI-discovered Doctor check catalog projection) -> doctor projection chunks.
|
|
2. Storage:
|
|
- PostgreSQL tables in schema `advisoryai` via migration `src/AdvisoryAI/StellaOps.AdvisoryAI/Storage/Migrations/002_knowledge_search.sql`.
|
|
3. Retrieval:
|
|
- FTS (`tsvector` + `websearch_to_tsquery`) + optional vector stage.
|
|
- Deterministic fusion and tie-breaking in `KnowledgeSearchService`.
|
|
4. Delivery:
|
|
- API endpoint: `POST /v1/advisory-ai/search`.
|
|
- Index rebuild endpoint: `POST /v1/advisory-ai/index/rebuild`.
|
|
|
|
## Data model
|
|
AKS schema tables:
|
|
- `advisoryai.kb_doc`: canonical source docs with product/version/content hash metadata.
|
|
- `advisoryai.kb_chunk`: searchable units (`md_section`, `api_operation`, `doctor_check`) with anchors, spans, `tsvector`, and embeddings.
|
|
- `advisoryai.api_spec`: raw OpenAPI snapshot (`jsonb`) by service.
|
|
- `advisoryai.api_operation`: normalized operation records (`method`, `path`, `operation_id`, tags, request/response/security json).
|
|
- `advisoryai.doctor_search_projection`: searchable doctor metadata and remediation.
|
|
|
|
Vector support:
|
|
- Tries `CREATE EXTENSION vector`.
|
|
- If unavailable, AKS remains fully functional via FTS and deterministic array embeddings fallback.
|
|
|
|
## Deterministic ingestion rules
|
|
### Markdown
|
|
- Source order:
|
|
1. Allow-list file: `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/knowledge-docs-allowlist.json`.
|
|
2. Generated manifest (optional, from CLI tool): `knowledge-docs-manifest.json`.
|
|
3. Fallback scan roots (`docs/**`) only if allow-list resolves no markdown files.
|
|
- Chunk by H2/H3 headings.
|
|
- Stable anchors using slug + duplicate suffix.
|
|
- Stable chunk IDs from source path + anchor + span.
|
|
- Metadata includes path, anchor, section path, tags.
|
|
|
|
### OpenAPI
|
|
- Source order:
|
|
1. Aggregated OpenAPI file path (default `devops/compose/openapi_current.json`).
|
|
2. Fallback repository scan for `openapi.json` when aggregate is missing.
|
|
- Parse deterministic JSON aggregate for MVP.
|
|
- Emit one searchable chunk per HTTP operation.
|
|
- Preserve structured operation payloads (`request_json`, `responses_json`, `security_json`).
|
|
|
|
### Doctor
|
|
- Source order:
|
|
1. Seed file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-seed.json`.
|
|
2. Controls file `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/doctor-search-controls.json` (contains control fields plus fallback metadata from `stella advisoryai sources prepare`).
|
|
3. Optional Doctor endpoint metadata (`DoctorChecksEndpoint`) when configured.
|
|
- `stella advisoryai sources prepare` merges configured seed entries with `DoctorEngine.ListChecks()` (when available in CLI runtime) and writes enriched control projection metadata (`title`, `severity`, `description`, `remediation`, `runCommand`, `symptoms`, `tags`, `references`).
|
|
- Emit doctor chunk + projection record including:
|
|
- `checkCode`, `title`, `severity`, `runCommand`, remediation, symptoms.
|
|
- control metadata (`control`, `requiresConfirmation`, `isDestructive`, `inspectCommand`, `verificationCommand`).
|
|
|
|
## Ranking strategy
|
|
Implemented in `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchService.cs`:
|
|
- Candidate retrieval:
|
|
- lexical set from FTS.
|
|
- optional vector set from embedding candidates.
|
|
- Fusion:
|
|
- reciprocal rank fusion style scoring.
|
|
- Deterministic boosts:
|
|
- exact `checkCode` match.
|
|
- exact `operationId` match.
|
|
- `METHOD /path` match.
|
|
- filter-aligned service/tag boosts.
|
|
- Deterministic ordering:
|
|
- score desc -> kind asc -> chunk id asc.
|
|
|
|
## API contract
|
|
### Search
|
|
- `POST /v1/advisory-ai/search`
|
|
- Request:
|
|
- `q` (required), `k`, `filters.type|product|version|service|tags`, `includeDebug`.
|
|
- Response:
|
|
- typed results (`docs|api|doctor`) with snippet, score, and open action.
|
|
|
|
### Rebuild
|
|
- `POST /v1/advisory-ai/index/rebuild`
|
|
- Rebuilds AKS deterministically from local docs/specs/doctor metadata.
|
|
|
|
## Web behavior
|
|
Global search now consumes AKS and supports:
|
|
- Mixed grouped results (`Docs`, `API Endpoints`, `Doctor Checks`).
|
|
- Type filter chips.
|
|
- Result actions:
|
|
- Docs: `Open`.
|
|
- API: `Curl` (copy command).
|
|
- Doctor: `Run` (navigate to doctor and copy run command).
|
|
- `More` action for "show more like this" local query expansion.
|
|
|
|
## CLI behavior
|
|
AKS commands:
|
|
- `stella search "<query>" [--type docs|api|doctor] [--product ...] [--version ...] [--service ...] [--tag ...] [--k N] [--json]`
|
|
- `stella doctor suggest "<symptom>" [--product ...] [--version ...] [--k N] [--json]`
|
|
- `stella advisoryai index rebuild [--json]`
|
|
- `stella advisoryai sources prepare [--repo-root ...] [--docs-allowlist ...] [--docs-manifest-output ...] [--openapi-output ...] [--doctor-seed ...] [--doctor-controls-output ...] [--overwrite] [--json]`
|
|
|
|
Output:
|
|
- Human mode: grouped actionable references.
|
|
- JSON mode: stable machine-readable payload.
|
|
|
|
## Test/benchmark strategy
|
|
Implemented benchmark framework:
|
|
- Generator: `KnowledgeSearchBenchmarkDatasetGenerator` (deterministic synthetic set with explicit ground truth).
|
|
- Runner: `KnowledgeSearchBenchmarkRunner` (recall@k, p50/p95 latency, stability pass).
|
|
- Models/serialization:
|
|
- `KnowledgeSearchBenchmarkModels.cs`
|
|
- `KnowledgeSearchBenchmarkJson.cs`
|
|
|
|
Tests:
|
|
- `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/KnowledgeSearch/KnowledgeSearchBenchmarkTests.cs`
|
|
- verifies deterministic dataset generation with >= 1000 queries.
|
|
- verifies recall/latency metrics and top-k match behavior.
|
|
|
|
## Dedicated AKS test DB
|
|
Compose profile:
|
|
- `devops/compose/docker-compose.advisoryai-knowledge-test.yml`
|
|
|
|
Init script:
|
|
- `devops/compose/postgres-init/advisoryai-knowledge-test/01_extensions.sql`
|
|
|
|
Example workflow:
|
|
```bash
|
|
docker compose -f devops/compose/docker-compose.advisoryai-knowledge-test.yml up -d
|
|
stella advisoryai sources prepare --json
|
|
stella advisoryai index rebuild --json
|
|
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj
|
|
```
|
|
|
|
## Known limitations and follow-ups
|
|
- YAML OpenAPI ingestion is not included in MVP.
|
|
- End-to-end benchmark against live Postgres-backed AKS service is planned as a follow-up CI lane.
|
|
- Optional external embedding providers can be added later without changing API contracts.
|