Files
git.stella-ops.org/docs/benchmarks/tiered-precision-curves.md
master 8bbfe4d2d2 feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00

4.0 KiB

Tiered Precision Curves for Scanner Accuracy

Advisory: 16-Dec-2025 - Measuring Progress with Tiered Precision Curves
Status: Processing
Related Sprints: SPRINT_3500_0003_0001 (Ground-Truth Corpus)

Executive Summary

This advisory introduces a tiered approach to measuring scanner accuracy that prevents metric gaming. By tracking precision/recall separately for three evidence tiers (Imported, Executed, Tainted→Sink), we ensure improvements in one tier don't hide regressions in another.

Key Concepts

Evidence Tiers

Tier Description Risk Level Typical Volume
Imported Vuln exists in dependency Lowest High
Executed Code/deps actually run Medium Medium
Tainted→Sink User data reaches sink Highest Low

Tier Precedence

Highest tier wins when a finding has multiple evidence types:

  1. tainted_sink (highest)
  2. executed
  3. imported

Implementation Components

1. Evidence Schema (eval schema)

-- Ground truth samples
eval.sample(sample_id, name, repo_path, commit_sha, language, scenario, entrypoints)

-- Expected findings
eval.expected_finding(expected_id, sample_id, vuln_key, tier, rule_key, sink_class)

-- Evaluation runs
eval.run(eval_run_id, scanner_version, rules_hash, concelier_snapshot_hash)

-- Observed results
eval.observed_finding(observed_id, eval_run_id, sample_id, vuln_key, tier, score, rule_key, evidence)

-- Computed metrics
eval.metrics(eval_run_id, tier, op_point, precision, recall, f1, pr_auc, latency_p50_ms)

2. Scanner Worker Changes

Workers emit evidence primitives:

  • DependencyEvidence { purl, version, lockfile_path }
  • ReachabilityEvidence { entrypoint, call_path[], confidence }
  • TaintEvidence { source, sink, sanitizers[], dataflow_path[], confidence }

3. Scanner WebService Changes

WebService performs tiering:

  • Merge evidence for same vuln_key
  • Run reachability/taint algorithms
  • Assign evidence_tier deterministically
  • Persist normalized findings

4. Evaluator CLI

New tool StellaOps.Scanner.Evaluation.Cli:

  • import-corpus - Load samples and expected findings
  • run - Trigger scans using replay manifest
  • compute - Calculate per-tier PR curves
  • report - Generate markdown artifacts

5. CI Gates

Fail builds when:

  • PR-AUC(imported) drops > 2%
  • PR-AUC(executed/tainted_sink) drops > 1%
  • FP rate in tainted_sink > 5% at Recall ≥ 0.7

Operating Points

Tier Target Recall Purpose
imported ≥ 0.60 Broad coverage
executed ≥ 0.70 Material risk
tainted_sink ≥ 0.80 Actionable findings

Integration with Existing Systems

Concelier

  • Stores advisory data, does not tier
  • Tag advisories with sink classes when available

Excititor (VEX)

  • Include tier in VEX statements
  • Allow policy per-tier thresholds
  • Preserve pruning provenance

Notify

  • Gate alerts on tiered thresholds
  • Page only on tainted_sink at operating point

UI

  • Show tier badge on findings
  • Default sort: tainted_sink > executed > imported
  • Display evidence summary (entrypoint, path length, sink class)

Success Criteria

  1. Can demonstrate release where overall precision stayed flat but tainted→sink PR-AUC improved
  2. On-call noise reduced via tier-gated paging
  3. TTFS p95 for tainted→sink within budget

Overlap Analysis

This advisory extends the ground-truth corpus work (SPRINT_3500_0003_0001) with:

  • Tiered precision tracking (new)
  • Per-tier operating points (new)
  • CI gates based on tier-specific AUC (enhancement)
  • Integration with Notify for tier-gated alerts (new)

No contradictions with existing implementations found.