6.9 KiB
6.9 KiB
Unified Search Operations Runbook
Scope
Runbook for AdvisoryAI unified search setup, operations, troubleshooting, performance, and rollout control.
Setup
- Configure
AdvisoryAI:KnowledgeSearch:ConnectionString. - Configure
AdvisoryAI:UnifiedSearchoptions. - For live compose/runtime, set
AdvisoryAI:KnowledgeSearch:FindingsAdapterBaseUrl,...:VexAdapterBaseUrl, and...:PolicyAdapterBaseUrltogether so findings, VEX, and policy ingest from live services instead of partial fallback snapshots. - Ensure the published AdvisoryAI image carries the repo-shaped local corpus under
/app, includingsrc/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/{findings,vex,policy,graph,opsmemory,timeline,scanner}.snapshot.json. - Ensure model artifact path exists when
VectorEncoderType=onnx:- default:
models/all-MiniLM-L6-v2.onnx
- default:
- Rebuild indexes in order when verifying live search quality:
POST /v1/advisory-ai/index/rebuildPOST /v1/search/index/rebuild
- Verify query endpoint:
POST /v1/search/querywithX-StellaOps-Tenantandadvisory-ai:operatescope.
Key Endpoints
POST /v1/search/queryPOST /v1/search/synthesizePOST /v1/search/index/rebuildPOST /v1/advisory-ai/search/analyticsGET /v1/advisory-ai/search/quality/metricsGET /v1/advisory-ai/search/quality/alerts
Monitoring
Track per-tenant and global:
- Query throughput (
query,click,zero_result,synthesisevents) - Self-serve journey signals (
answer_frame,reformulation,rescue_action) - P50/P95/P99 latency for
/v1/search/query - Zero-result rate
- Fallback answer rate, clarify rate, insufficient-evidence rate
- Reformulation count, rescue-action count, abandoned fallback count
- Synthesis quota denials
- Index size and rebuild duration
- Active encoder diagnostics (
diagnostics.activeEncoder)
Performance Targets
- Instant results: P50 < 100ms, P95 < 200ms, P99 < 300ms
- Full results (federated): P50 < 200ms, P95 < 500ms, P99 < 800ms
- Deterministic synthesis: P50 < 30ms, P95 < 50ms
- LLM synthesis: TTFB P50 < 1s, total P95 < 5s
SQL Query Tuning and EXPLAIN Evidence
Unified search read paths rely on:
- FTS query over
advisoryai.kb_chunk.body_tsv* - Trigram fuzzy fallback (
%/similarity()) - Vector nearest-neighbor (
embedding_vec <=> query_vector)
Recommended validation commands:
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.chunk_id
FROM advisoryai.kb_chunk c
WHERE c.body_tsv_en @@ websearch_to_tsquery('english', @query)
ORDER BY ts_rank_cd(c.body_tsv_en, websearch_to_tsquery('english', @query), 32) DESC, c.chunk_id
LIMIT 20;
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.chunk_id
FROM advisoryai.kb_chunk c
WHERE c.embedding_vec IS NOT NULL
ORDER BY c.embedding_vec <=> CAST(@query_vector AS vector), c.chunk_id
LIMIT 20;
Index expectations:
idx_kb_chunk_body_tsv_en(GIN overbody_tsv_en)idx_kb_chunk_body_trgm(GIN trigram overbody)idx_kb_chunk_embedding_vec_hnsw(HNSW overembedding_vec)
Automated EXPLAIN evidence is captured by:
UnifiedSearchLiveAdapterIntegrationTests.PostgresKnowledgeSearchStore_ExplainAnalyze_ShowsIndexedSearchPlans
Load and Capacity Envelope
Validated test envelope (in-process benchmark harness):
- 50 concurrent requests sustained
- P95 < 500ms, P99 < 800ms
Sizing guidance:
- Up to 100k chunks: 2 vCPU / 4 GB RAM
- 100k-500k chunks: 4 vCPU / 8 GB RAM
-
500k chunks or heavy synthesis: 8 vCPU / 16 GB RAM, split synthesis workers
Feature Flags and Rollout
Config path: AdvisoryAI:UnifiedSearch:TenantFeatureFlags
EnabledFederationEnabledSynthesisEnabled
Example:
{
"AdvisoryAI": {
"UnifiedSearch": {
"TenantFeatureFlags": {
"tenant-alpha": { "Enabled": true, "FederationEnabled": true, "SynthesisEnabled": false },
"tenant-beta": { "Enabled": true, "FederationEnabled": false, "SynthesisEnabled": false }
}
}
}
}
Troubleshooting
Symptom: empty results
- Verify tenant header is present.
- Verify
UnifiedSearch.Enabledand tenant flagEnabled. - Run index rebuild and check chunk count.
- If suggestions also fail, verify both rebuild steps were run in order and re-check with a known live query such as
database connectivity. - If only findings answer lanes work while VEX/policy/graph/OpsMemory remain corpus-unready, verify the published snapshot files exist under
/app/src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/and confirm the VEX/policy adapter base URLs are configured in runtime env.
Symptom: poor semantic recall
- Verify
VectorEncoderTypeand active encoder diagnostics. - Confirm ONNX model path is accessible and valid.
- Rebuild index after encoder switch.
Symptom: synthesis unavailable
- Check
SynthesisEnabled(global + tenant). - Check quota counters and provider configuration.
Symptom: search feels self-serve weak
- Inspect
GET /v1/advisory-ai/search/quality/metrics?period=7d. - Watch
fallbackAnswerRate,clarifyRate,insufficientRate,reformulationCount,rescueActionCount, andabandonedFallbackCount. - Inspect
GET /v1/advisory-ai/search/quality/alertsforfallback_loopandabandoned_fallback. - Treat repeated fallback loops as ranking/context gaps; treat abandoned fallback sessions as UX/product gaps.
Symptom: high latency
- Check federated backend timeout budget.
- Review
EXPLAIN (ANALYZE)plans. - Verify index health and cardinality growth by tenant.
Backup and Recovery
- Unified index is derivable state.
- Recovery sequence:
- Restore primary domain systems (findings/vex/policy/docs sources).
- Restore AdvisoryAI DB schema.
- Trigger full index rebuild.
- Validate with quality benchmark fast subset.
Validation Commands
# Fast PR-level quality gate
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj \
-- --filter-class StellaOps.AdvisoryAI.Tests.UnifiedSearch.UnifiedSearchQualityBenchmarkFastSubsetTests
# Full benchmark + tuning evidence
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj \
-- --filter-class StellaOps.AdvisoryAI.Tests.UnifiedSearch.UnifiedSearchQualityBenchmarkTests
# Performance envelope
dotnet test src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj \
-- --filter-class StellaOps.AdvisoryAI.Tests.UnifiedSearch.UnifiedSearchPerformanceEnvelopeTests
# Self-serve telemetry and gap surfacing slice
dotnet build src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/StellaOps.AdvisoryAI.Tests.csproj -v minimal
src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/bin/Debug/net10.0/StellaOps.AdvisoryAI.Tests.exe \
-method "StellaOps.AdvisoryAI.Tests.Integration.UnifiedSearchSprintIntegrationTests.G10_SelfServeMetrics_IncludeFallbackReformulationAndRescueSignals" \
-method "StellaOps.AdvisoryAI.Tests.Integration.UnifiedSearchSprintIntegrationTests.G10_RecoveredFallbackSessions_DoNotCountAsAbandoned" \
-reporter verbose -noColor