- Add RateLimitConfig for configuration management with YAML binding support. - Introduce RateLimitDecision to encapsulate the result of rate limit checks. - Implement RateLimitMetrics for OpenTelemetry metrics tracking. - Create RateLimitMiddleware for enforcing rate limits on incoming requests. - Develop RateLimitService to orchestrate instance and environment rate limit checks. - Add RateLimitServiceCollectionExtensions for dependency injection registration.
4.0 KiB
4.0 KiB
Tiered Precision Curves for Scanner Accuracy
Advisory: 16-Dec-2025 - Measuring Progress with Tiered Precision Curves
Status: Processing
Related Sprints: SPRINT_3500_0003_0001 (Ground-Truth Corpus)
Executive Summary
This advisory introduces a tiered approach to measuring scanner accuracy that prevents metric gaming. By tracking precision/recall separately for three evidence tiers (Imported, Executed, Tainted→Sink), we ensure improvements in one tier don't hide regressions in another.
Key Concepts
Evidence Tiers
| Tier | Description | Risk Level | Typical Volume |
|---|---|---|---|
| Imported | Vuln exists in dependency | Lowest | High |
| Executed | Code/deps actually run | Medium | Medium |
| Tainted→Sink | User data reaches sink | Highest | Low |
Tier Precedence
Highest tier wins when a finding has multiple evidence types:
tainted_sink(highest)executedimported
Implementation Components
1. Evidence Schema (eval schema)
-- Ground truth samples
eval.sample(sample_id, name, repo_path, commit_sha, language, scenario, entrypoints)
-- Expected findings
eval.expected_finding(expected_id, sample_id, vuln_key, tier, rule_key, sink_class)
-- Evaluation runs
eval.run(eval_run_id, scanner_version, rules_hash, concelier_snapshot_hash)
-- Observed results
eval.observed_finding(observed_id, eval_run_id, sample_id, vuln_key, tier, score, rule_key, evidence)
-- Computed metrics
eval.metrics(eval_run_id, tier, op_point, precision, recall, f1, pr_auc, latency_p50_ms)
2. Scanner Worker Changes
Workers emit evidence primitives:
DependencyEvidence { purl, version, lockfile_path }ReachabilityEvidence { entrypoint, call_path[], confidence }TaintEvidence { source, sink, sanitizers[], dataflow_path[], confidence }
3. Scanner WebService Changes
WebService performs tiering:
- Merge evidence for same
vuln_key - Run reachability/taint algorithms
- Assign
evidence_tierdeterministically - Persist normalized findings
4. Evaluator CLI
New tool StellaOps.Scanner.Evaluation.Cli:
import-corpus- Load samples and expected findingsrun- Trigger scans using replay manifestcompute- Calculate per-tier PR curvesreport- Generate markdown artifacts
5. CI Gates
Fail builds when:
- PR-AUC(imported) drops > 2%
- PR-AUC(executed/tainted_sink) drops > 1%
- FP rate in
tainted_sink> 5% at Recall ≥ 0.7
Operating Points
| Tier | Target Recall | Purpose |
|---|---|---|
imported |
≥ 0.60 | Broad coverage |
executed |
≥ 0.70 | Material risk |
tainted_sink |
≥ 0.80 | Actionable findings |
Integration with Existing Systems
Concelier
- Stores advisory data, does not tier
- Tag advisories with sink classes when available
Excititor (VEX)
- Include
tierin VEX statements - Allow policy per-tier thresholds
- Preserve pruning provenance
Notify
- Gate alerts on tiered thresholds
- Page only on
tainted_sinkat operating point
UI
- Show tier badge on findings
- Default sort: tainted_sink > executed > imported
- Display evidence summary (entrypoint, path length, sink class)
Success Criteria
- Can demonstrate release where overall precision stayed flat but tainted→sink PR-AUC improved
- On-call noise reduced via tier-gated paging
- TTFS p95 for tainted→sink within budget
Related Documentation
Overlap Analysis
This advisory extends the ground-truth corpus work (SPRINT_3500_0003_0001) with:
- Tiered precision tracking (new)
- Per-tier operating points (new)
- CI gates based on tier-specific AUC (enhancement)
- Integration with Notify for tier-gated alerts (new)
No contradictions with existing implementations found.