git.stella-ops.org/docs/modules/advisory-ai/unified-search-architecture.md

# Unified Search Architecture

This document defines the architecture for AdvisoryAI unified search (Sprint 100, Phase 4 hardening).

## Goals
- Help operators and users unfamiliar with Stella Ops terminology find relevant results quickly.
- Merge platform knowledge, findings, VEX, policy, graph, timeline, scanner, and OpsMemory signals into one deterministic ranking stream.
- Keep the system offline-capable and tenant-safe.

## Four-Layer Architecture

```mermaid
flowchart LR
  Q[Layer 1: Query Understanding]
  R[Layer 2: Federated Retrieval]
  F[Layer 3: Fusion and Entity Cards]
  S[Layer 4: Synthesis]

  Q --> R --> F --> S
```

### Layer 1: Query Understanding
- Input: `UnifiedSearchRequest` (`q`, filters, ambient context, session id).
- Ambient context envelope from Web clients includes route/session continuity fields and optional last-action metadata (`action`, `source`, `queryHint`, `domain`, `entityKey`, `route`, `occurredAt`) for follow-up ranking/refinement.
- Components:
  - `EntityExtractor`
  - `IntentClassifier`
  - `DomainWeightCalculator`
  - `AmbientContextProcessor`
  - `SearchSessionContextService`
- Output: `QueryPlan` with intent, detected entities, domain weights, and context boosts.

### Layer 2: Federated Retrieval
- Sources queried in parallel:
  - Primary universal index (`IKnowledgeSearchStore` FTS + vector candidates)
  - Optional federated backends via `FederatedSearchDispatcher`
- Ingestion adapters keep index coverage aligned across domains:
  - Findings, VEX, Policy (live + snapshot fallback)
  - Graph, Timeline, Scanner, OpsMemory snapshots
  - Platform catalog
- Tenant isolation is enforced in request filters and chunk identities.

### Layer 3: Fusion and Entity Cards
- `WeightedRrfFusion` merges lexical + vector candidates with domain weights.
- Additional boosts:
  - Entity proximity
  - Ambient/session carry-forward
  - Graph gravity
  - Optional popularity and freshness controls
- Current-page scope is applied here as a ranking bias, not a hard filter. When outside-scope results remain materially stronger, or remain inside a narrow relative score band, the response surfaces them in bounded `overflow` metadata instead of hiding them.
- `EntityCardAssembler` groups facets into entity cards and resolves aliases.

### Layer 4: Synthesis
- Deterministic synthesis is always available from top cards.
- Optional LLM tier (`SearchSynthesisService`) streams over SSE with:
  - quota enforcement
  - grounding score
  - action suggestions
- If LLM is unavailable or blocked by quota, deterministic output is still returned.
- Query responses may also include a deterministic `contextAnswer` envelope for answer-first search UX:
  - `status`: `grounded` | `clarify` | `insufficient`
  - `code`, `summary`, `reason`, `evidence`
  - bounded `citations`
  - bounded follow-up `questions`
- The answer envelope is additive and optional so older clients remain compatible.

### Telemetry and gap surfacing
- Search analytics stays optional at the client layer; queries still work when analytics events are never emitted.
- `AdvisoryAI:KnowledgeSearch:SearchTelemetryEnabled=false` also disables backend analytics persistence, feedback persistence, popularity-map reads, and unified-search telemetry sink emission without disabling retrieval or history.
- When enabled, the self-serve lane records `answer_frame`, `reformulation`, and `rescue_action` with hashed query keys, hashed tenant-scoped session ids, and bounded answer metadata.
- Quality review surfaces:
  - `GET /v1/advisory-ai/search/quality/metrics`
  - `GET /v1/advisory-ai/search/quality/alerts`
- Current self-serve gap signals:
  - fallback answer rate
  - clarify rate
  - insufficient-evidence rate
  - reformulation count
  - rescue-action count
  - abandoned fallback count
  - `fallback_loop` and `abandoned_fallback` alerts

## Data Flow

```mermaid
sequenceDiagram
  participant UI as Web UI / API Client
  participant API as UnifiedSearchEndpoints
  participant PLAN as QueryUnderstanding
  participant IDX as KnowledgeSearchStore
  participant FED as FederatedDispatcher
  participant FUS as WeightedRrfFusion
  participant CARDS as EntityCardAssembler
  participant SYN as SearchSynthesisService
  participant ANA as SearchAnalyticsService

  UI->>API: POST /v1/search/query
  API->>PLAN: Build QueryPlan
  PLAN-->>API: intent + entities + domain weights
  API->>IDX: SearchFtsAsync + LoadVectorCandidatesAsync
  API->>FED: DispatchAsync (optional)
  IDX-->>API: lexical + vector rows
  FED-->>API: federated rows + diagnostics
  API->>FUS: Fuse rankings
  FUS-->>API: ranked rows
  API->>CARDS: Assemble entity cards
  CARDS-->>API: entity cards
  API->>ANA: Record query/click/zero_result
  API-->>UI: UnifiedSearchResponse

  UI->>API: POST /v1/search/synthesize
  API->>SYN: ExecuteAsync
  SYN-->>UI: SSE deterministic-first + optional LLM chunks
```

## Contracts and API Surface
- `POST /v1/search/query`
- `POST /v1/search/suggestions/evaluate`
- `POST /v1/search/synthesize`
- `POST /v1/search/index/rebuild`

`POST /v1/search/query` response notes:
- Entity cards remain the primary retrieval payload.
- `contextAnswer` is the preferred answer-first surface for Web self-serve UX when present.
- `contextAnswer` is query-driven rather than mode-driven: compare-like requests may blend close evidence clusters, while scoped troubleshoot requests prefer a single decisive answer when one result clearly leads.
- `overflow` is additive and bounded so FE can show "outside the current page, but likely relevant" results without reintroducing a scope toggle.
- `overflow` is intentionally narrow: it is suppressed when the current-scope winner has a clear lead, so FE can trust the primary section as the best local answer.
- `coverage` is additive and bounded so FE can suppress misleading suggestions when the active corpus has no sensible candidates for that domain.
- Live local verification currently covers the Doctor/knowledge path after the documented rebuild order:
  1. `POST /v1/advisory-ai/index/rebuild`
  2. `POST /v1/search/index/rebuild`

`POST /v1/search/suggestions/evaluate` response notes:
- Intended for proactive suggestion chips and page-owned prompts before the user commits a search.
- Returns per-query viability plus bounded domain coverage.
- Additive fields:
  - `viabilityState`: `grounded` | `needs_clarification` | `no_match` | `scope_unready` | `corpus_unready`
  - `scopeReady`: `true` when the current route scope has indexed corpus behind it
- `viable=true` is reserved for suggestions that already have grounded evidence; clarify-only suggestions are not considered executable.
- Does not require telemetry and does not record answer-frame analytics.

OpenAPI contract presence is validated by integration test:
- `UnifiedSearchEndpointsIntegrationTests.OpenApi_Includes_UnifiedSearch_Contracts`

## Determinism Rules
- Stable ordering tie-breaks by `kind` then `chunkId`.
- Ranking benchmark includes a deterministic stability hash across top results.
- Session context is ephemeral and expires by inactivity timeout.

## Configuration
Primary section: `AdvisoryAI:UnifiedSearch`
- `Enabled`
- `BaseDomainWeights`
- `Weighting.*` (domain/intent/entity/role boosts)
- `Federation.*`
- `GravityBoost.*`
- `Synthesis.*`
- `Ingestion.*`
- `Session.*`
- `TenantFeatureFlags.<tenant>.{Enabled,FederationEnabled,SynthesisEnabled}`

Detailed operator config and examples:
- `docs/operations/unified-search-operations.md`
- `docs/modules/advisory-ai/knowledge-search.md`