2.1 KiB
2.1 KiB
Unified Search Ranking Benchmark and Tuning Report
Corpus
- File:
src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/unified-search-quality-corpus.json - Cases: 250 queries
- Archetypes:
cve_lookup,package_image,documentation,doctor_diagnostic,policy_search,audit_timeline,cross_domain,conversational_followup - Labels: relevance grades
0..3
Metrics
- Precision@1, @3, @5, @10
- Recall@10
- NDCG@10
- Entity-card top hit accuracy
- Cross-domain recall
- Ranking stability hash (SHA-256)
Quality Gates
- P@1 >= 0.80
- NDCG@10 >= 0.70
- Entity-card accuracy >= 0.85
- Cross-domain recall >= 0.60
Tuning Method
- Deterministic grid search over weighting parameters used by
DomainWeightCalculator. - Parameter ranges:
CveBoostFindings: {0.35, 0.45}CveBoostVex: {0.30, 0.38}PackageBoostGraph: {0.20, 0.36, 0.48}PackageBoostScanner: {0.12, 0.28, 0.40}AuditBoostTimeline: {0.10, 0.24, 0.34}PolicyBoostPolicy: {0.30, 0.38}
- Tie-breakers: NDCG@10, then P@1, then stability hash.
Baseline vs Tuned
Values populated from UnifiedSearchQualityBenchmarkTests output.
| Variant | P@1 | NDCG@10 | Entity Accuracy | Cross-domain Recall | Gates Passed |
|---|---|---|---|---|---|
| Baseline (legacy weighting) | 0.9560 | 0.9522 | 0.9560 | 1.0000 | Yes |
| Tuned defaults | 0.9600 | 0.9598 | 0.9600 | 1.0000 | Yes |
Reference hashes from benchmark output:
- Baseline:
FF32EBE1DF1705A524B20B5A114B0CF496F1CA05147FC9FD869312903B8F40E9 - Tuned defaults:
B5A12ACFE304E6A4620BBB2E9280FEE2E29E952B3E832F92C69FFA10760DA957
Tuned Defaults Applied
UnifiedSearchOptions.BaseDomainWeights- knowledge=1.05, findings=1.20, vex=1.15, policy=1.10, graph=1.15, timeline=1.05, scanner=1.10, opsmemory=1.05
UnifiedSearchOptions.Weighting- cve/security/policy/troubleshoot/package/audit/role boosts aligned with tuned values in
UnifiedSearchOptions.cs
- cve/security/policy/troubleshoot/package/audit/role boosts aligned with tuned values in
Determinism
- Repeat runs produce identical stability hash for fixed corpus + options.
- Fast subset (50 queries) and full suite (250 queries) both run in CI lanes.