# Unified Search Ranking Benchmark and Tuning Report ## Corpus - File: `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/unified-search-quality-corpus.json` - Cases: 250 queries - Archetypes: `cve_lookup`, `package_image`, `documentation`, `doctor_diagnostic`, `policy_search`, `audit_timeline`, `cross_domain`, `conversational_followup` - Labels: relevance grades `0..3` ## Metrics - Precision@1, @3, @5, @10 - Recall@10 - NDCG@10 - Entity-card top hit accuracy - Cross-domain recall - Ranking stability hash (SHA-256) ## Quality Gates - P@1 >= 0.80 - NDCG@10 >= 0.70 - Entity-card accuracy >= 0.85 - Cross-domain recall >= 0.60 ## Tuning Method - Deterministic grid search over weighting parameters used by `DomainWeightCalculator`. - Parameter ranges: - `CveBoostFindings`: {0.35, 0.45} - `CveBoostVex`: {0.30, 0.38} - `PackageBoostGraph`: {0.20, 0.36, 0.48} - `PackageBoostScanner`: {0.12, 0.28, 0.40} - `AuditBoostTimeline`: {0.10, 0.24, 0.34} - `PolicyBoostPolicy`: {0.30, 0.38} - Tie-breakers: NDCG@10, then P@1, then stability hash. ## Baseline vs Tuned _Values populated from `UnifiedSearchQualityBenchmarkTests` output._ | Variant | P@1 | NDCG@10 | Entity Accuracy | Cross-domain Recall | Gates Passed | | --- | --- | --- | --- | --- | --- | | Baseline (legacy weighting) | 0.9560 | 0.9522 | 0.9560 | 1.0000 | Yes | | Tuned defaults | 0.9600 | 0.9598 | 0.9600 | 1.0000 | Yes | Reference hashes from benchmark output: - Baseline: `FF32EBE1DF1705A524B20B5A114B0CF496F1CA05147FC9FD869312903B8F40E9` - Tuned defaults: `B5A12ACFE304E6A4620BBB2E9280FEE2E29E952B3E832F92C69FFA10760DA957` ## Tuned Defaults Applied - `UnifiedSearchOptions.BaseDomainWeights` - knowledge=1.05, findings=1.20, vex=1.15, policy=1.10, graph=1.15, timeline=1.05, scanner=1.10, opsmemory=1.05 - `UnifiedSearchOptions.Weighting` - cve/security/policy/troubleshoot/package/audit/role boosts aligned with tuned values in `UnifiedSearchOptions.cs` ## Determinism - Repeat runs produce identical stability hash for fixed corpus + options. - Fast subset (50 queries) and full suite (250 queries) both run in CI lanes.