Search/AdvisoryAI and DAL conversion to EF finishes up. Preparation for microservices consolidation.
This commit is contained in:
143
docs/operations/unified-search-operations.md
Normal file
143
docs/operations/unified-search-operations.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Unified Search Operations Runbook
|
||||
|
||||
## Scope
|
||||
Runbook for AdvisoryAI unified search setup, operations, troubleshooting, performance, and rollout control.
|
||||
|
||||
## Setup
|
||||
1. Configure `AdvisoryAI:KnowledgeSearch:ConnectionString`.
|
||||
2. Configure `AdvisoryAI:UnifiedSearch` options.
|
||||
3. Ensure model artifact path exists when `VectorEncoderType=onnx`:
|
||||
- default: `models/all-MiniLM-L6-v2.onnx`
|
||||
4. Rebuild index:
|
||||
- `POST /v1/search/index/rebuild`
|
||||
5. Verify query endpoint:
|
||||
- `POST /v1/search/query` with `X-StellaOps-Tenant` and `advisory-ai:operate` scope.
|
||||
|
||||
## Key Endpoints
|
||||
- `POST /v1/search/query`
|
||||
- `POST /v1/search/synthesize`
|
||||
- `POST /v1/search/index/rebuild`
|
||||
- `POST /v1/advisory-ai/search/analytics`
|
||||
- `GET /v1/advisory-ai/search/quality/metrics`
|
||||
- `GET /v1/advisory-ai/search/quality/alerts`
|
||||
|
||||
## Monitoring
|
||||
Track per-tenant and global:
|
||||
- Query throughput (`query`, `click`, `zero_result`, `synthesis` events)
|
||||
- P50/P95/P99 latency for `/v1/search/query`
|
||||
- Zero-result rate
|
||||
- Synthesis quota denials
|
||||
- Index size and rebuild duration
|
||||
- Active encoder diagnostics (`diagnostics.activeEncoder`)
|
||||
|
||||
## Performance Targets
|
||||
- Instant results: P50 < 100ms, P95 < 200ms, P99 < 300ms
|
||||
- Full results (federated): P50 < 200ms, P95 < 500ms, P99 < 800ms
|
||||
- Deterministic synthesis: P50 < 30ms, P95 < 50ms
|
||||
- LLM synthesis: TTFB P50 < 1s, total P95 < 5s
|
||||
|
||||
## SQL Query Tuning and EXPLAIN Evidence
|
||||
Unified search read paths rely on:
|
||||
- FTS query over `advisoryai.kb_chunk.body_tsv*`
|
||||
- Trigram fuzzy fallback (`%` / `similarity()`)
|
||||
- Vector nearest-neighbor (`embedding_vec <=> query_vector`)
|
||||
|
||||
Recommended validation commands:
|
||||
```sql
|
||||
EXPLAIN (ANALYZE, BUFFERS)
|
||||
SELECT c.chunk_id
|
||||
FROM advisoryai.kb_chunk c
|
||||
WHERE c.body_tsv_en @@ websearch_to_tsquery('english', @query)
|
||||
ORDER BY ts_rank_cd(c.body_tsv_en, websearch_to_tsquery('english', @query), 32) DESC, c.chunk_id
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
```sql
|
||||
EXPLAIN (ANALYZE, BUFFERS)
|
||||
SELECT c.chunk_id
|
||||
FROM advisoryai.kb_chunk c
|
||||
WHERE c.embedding_vec IS NOT NULL
|
||||
ORDER BY c.embedding_vec <=> CAST(@query_vector AS vector), c.chunk_id
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
Index expectations:
|
||||
- `idx_kb_chunk_body_tsv_en` (GIN over `body_tsv_en`)
|
||||
- `idx_kb_chunk_body_trgm` (GIN trigram over `body`)
|
||||
- `idx_kb_chunk_embedding_vec_hnsw` (HNSW over `embedding_vec`)
|
||||
|
||||
Automated EXPLAIN evidence is captured by:
|
||||
- `UnifiedSearchLiveAdapterIntegrationTests.PostgresKnowledgeSearchStore_ExplainAnalyze_ShowsIndexedSearchPlans`
|
||||
|
||||
## Load and Capacity Envelope
|
||||
Validated test envelope (in-process benchmark harness):
|
||||
- 50 concurrent requests sustained
|
||||
- P95 < 500ms, P99 < 800ms
|
||||
|
||||
Sizing guidance:
|
||||
- Up to 100k chunks: 2 vCPU / 4 GB RAM
|
||||
- 100k-500k chunks: 4 vCPU / 8 GB RAM
|
||||
- >500k chunks or heavy synthesis: 8 vCPU / 16 GB RAM, split synthesis workers
|
||||
|
||||
## Feature Flags and Rollout
|
||||
Config path: `AdvisoryAI:UnifiedSearch:TenantFeatureFlags`
|
||||
- `Enabled`
|
||||
- `FederationEnabled`
|
||||
- `SynthesisEnabled`
|
||||
|
||||
Example:
|
||||
```json
|
||||
{
|
||||
"AdvisoryAI": {
|
||||
"UnifiedSearch": {
|
||||
"TenantFeatureFlags": {
|
||||
"tenant-alpha": { "Enabled": true, "FederationEnabled": true, "SynthesisEnabled": false },
|
||||
"tenant-beta": { "Enabled": true, "FederationEnabled": false, "SynthesisEnabled": false }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
### Symptom: empty results
|
||||
- Verify tenant header is present.
|
||||
- Verify `UnifiedSearch.Enabled` and tenant flag `Enabled`.
|
||||
- Run index rebuild and check chunk count.
|
||||
|
||||
### Symptom: poor semantic recall
|
||||
- Verify `VectorEncoderType` and active encoder diagnostics.
|
||||
- Confirm ONNX model path is accessible and valid.
|
||||
- Rebuild index after encoder switch.
|
||||
|
||||
### Symptom: synthesis unavailable
|
||||
- Check `SynthesisEnabled` (global + tenant).
|
||||
- Check quota counters and provider configuration.
|
||||
|
||||
### Symptom: high latency
|
||||
- Check federated backend timeout budget.
|
||||
- Review `EXPLAIN (ANALYZE)` plans.
|
||||
- Verify index health and cardinality growth by tenant.
|
||||
|
||||
## Backup and Recovery
|
||||
- Unified index is derivable state.
|
||||
- Recovery sequence:
|
||||
1. Restore primary domain systems (findings/vex/policy/docs sources).
|
||||
2. Restore AdvisoryAI DB schema.
|
||||
3. Trigger full index rebuild.
|
||||
4. Validate with quality benchmark fast subset.
|
||||
|
||||
## Validation Commands
|
||||
```bash
|
||||
# Fast PR-level quality gate
|
||||
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj \
|
||||
-- --filter-class StellaOps.AdvisoryAI.Tests.UnifiedSearch.UnifiedSearchQualityBenchmarkFastSubsetTests
|
||||
|
||||
# Full benchmark + tuning evidence
|
||||
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj \
|
||||
-- --filter-class StellaOps.AdvisoryAI.Tests.UnifiedSearch.UnifiedSearchQualityBenchmarkTests
|
||||
|
||||
# Performance envelope
|
||||
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj \
|
||||
-- --filter-class StellaOps.AdvisoryAI.Tests.UnifiedSearch.UnifiedSearchPerformanceEnvelopeTests
|
||||
```
|
||||
Reference in New Issue
Block a user