55 lines
2.1 KiB
Markdown
55 lines
2.1 KiB
Markdown
# Unified Search Ranking Benchmark and Tuning Report
|
|
|
|
## Corpus
|
|
- File: `src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/unified-search-quality-corpus.json`
|
|
- Cases: 250 queries
|
|
- Archetypes: `cve_lookup`, `package_image`, `documentation`, `doctor_diagnostic`, `policy_search`, `audit_timeline`, `cross_domain`, `conversational_followup`
|
|
- Labels: relevance grades `0..3`
|
|
|
|
## Metrics
|
|
- Precision@1, @3, @5, @10
|
|
- Recall@10
|
|
- NDCG@10
|
|
- Entity-card top hit accuracy
|
|
- Cross-domain recall
|
|
- Ranking stability hash (SHA-256)
|
|
|
|
## Quality Gates
|
|
- P@1 >= 0.80
|
|
- NDCG@10 >= 0.70
|
|
- Entity-card accuracy >= 0.85
|
|
- Cross-domain recall >= 0.60
|
|
|
|
## Tuning Method
|
|
- Deterministic grid search over weighting parameters used by `DomainWeightCalculator`.
|
|
- Parameter ranges:
|
|
- `CveBoostFindings`: {0.35, 0.45}
|
|
- `CveBoostVex`: {0.30, 0.38}
|
|
- `PackageBoostGraph`: {0.20, 0.36, 0.48}
|
|
- `PackageBoostScanner`: {0.12, 0.28, 0.40}
|
|
- `AuditBoostTimeline`: {0.10, 0.24, 0.34}
|
|
- `PolicyBoostPolicy`: {0.30, 0.38}
|
|
- Tie-breakers: NDCG@10, then P@1, then stability hash.
|
|
|
|
## Baseline vs Tuned
|
|
_Values populated from `UnifiedSearchQualityBenchmarkTests` output._
|
|
|
|
| Variant | P@1 | NDCG@10 | Entity Accuracy | Cross-domain Recall | Gates Passed |
|
|
| --- | --- | --- | --- | --- | --- |
|
|
| Baseline (legacy weighting) | 0.9560 | 0.9522 | 0.9560 | 1.0000 | Yes |
|
|
| Tuned defaults | 0.9600 | 0.9598 | 0.9600 | 1.0000 | Yes |
|
|
|
|
Reference hashes from benchmark output:
|
|
- Baseline: `FF32EBE1DF1705A524B20B5A114B0CF496F1CA05147FC9FD869312903B8F40E9`
|
|
- Tuned defaults: `B5A12ACFE304E6A4620BBB2E9280FEE2E29E952B3E832F92C69FFA10760DA957`
|
|
|
|
## Tuned Defaults Applied
|
|
- `UnifiedSearchOptions.BaseDomainWeights`
|
|
- knowledge=1.05, findings=1.20, vex=1.15, policy=1.10, graph=1.15, timeline=1.05, scanner=1.10, opsmemory=1.05
|
|
- `UnifiedSearchOptions.Weighting`
|
|
- cve/security/policy/troubleshoot/package/audit/role boosts aligned with tuned values in `UnifiedSearchOptions.cs`
|
|
|
|
## Determinism
|
|
- Repeat runs produce identical stability hash for fixed corpus + options.
|
|
- Fast subset (50 queries) and full suite (250 queries) both run in CI lanes.
|