Files
git.stella-ops.org/docs/benchmarks/tiered-precision-curves.md
master 8bbfe4d2d2 feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00

128 lines
4.0 KiB
Markdown

# Tiered Precision Curves for Scanner Accuracy
**Advisory:** 16-Dec-2025 - Measuring Progress with Tiered Precision Curves
**Status:** Processing
**Related Sprints:** SPRINT_3500_0003_0001 (Ground-Truth Corpus)
## Executive Summary
This advisory introduces a tiered approach to measuring scanner accuracy that prevents metric gaming. By tracking precision/recall separately for three evidence tiers (Imported, Executed, Tainted→Sink), we ensure improvements in one tier don't hide regressions in another.
## Key Concepts
### Evidence Tiers
| Tier | Description | Risk Level | Typical Volume |
|------|-------------|------------|----------------|
| **Imported** | Vuln exists in dependency | Lowest | High |
| **Executed** | Code/deps actually run | Medium | Medium |
| **Tainted→Sink** | User data reaches sink | Highest | Low |
### Tier Precedence
Highest tier wins when a finding has multiple evidence types:
1. `tainted_sink` (highest)
2. `executed`
3. `imported`
## Implementation Components
### 1. Evidence Schema (`eval` schema)
```sql
-- Ground truth samples
eval.sample(sample_id, name, repo_path, commit_sha, language, scenario, entrypoints)
-- Expected findings
eval.expected_finding(expected_id, sample_id, vuln_key, tier, rule_key, sink_class)
-- Evaluation runs
eval.run(eval_run_id, scanner_version, rules_hash, concelier_snapshot_hash)
-- Observed results
eval.observed_finding(observed_id, eval_run_id, sample_id, vuln_key, tier, score, rule_key, evidence)
-- Computed metrics
eval.metrics(eval_run_id, tier, op_point, precision, recall, f1, pr_auc, latency_p50_ms)
```
### 2. Scanner Worker Changes
Workers emit evidence primitives:
- `DependencyEvidence { purl, version, lockfile_path }`
- `ReachabilityEvidence { entrypoint, call_path[], confidence }`
- `TaintEvidence { source, sink, sanitizers[], dataflow_path[], confidence }`
### 3. Scanner WebService Changes
WebService performs tiering:
- Merge evidence for same `vuln_key`
- Run reachability/taint algorithms
- Assign `evidence_tier` deterministically
- Persist normalized findings
### 4. Evaluator CLI
New tool `StellaOps.Scanner.Evaluation.Cli`:
- `import-corpus` - Load samples and expected findings
- `run` - Trigger scans using replay manifest
- `compute` - Calculate per-tier PR curves
- `report` - Generate markdown artifacts
### 5. CI Gates
Fail builds when:
- PR-AUC(imported) drops > 2%
- PR-AUC(executed/tainted_sink) drops > 1%
- FP rate in `tainted_sink` > 5% at Recall ≥ 0.7
## Operating Points
| Tier | Target Recall | Purpose |
|------|--------------|---------|
| `imported` | ≥ 0.60 | Broad coverage |
| `executed` | ≥ 0.70 | Material risk |
| `tainted_sink` | ≥ 0.80 | Actionable findings |
## Integration with Existing Systems
### Concelier
- Stores advisory data, does not tier
- Tag advisories with sink classes when available
### Excititor (VEX)
- Include `tier` in VEX statements
- Allow policy per-tier thresholds
- Preserve pruning provenance
### Notify
- Gate alerts on tiered thresholds
- Page only on `tainted_sink` at operating point
### UI
- Show tier badge on findings
- Default sort: tainted_sink > executed > imported
- Display evidence summary (entrypoint, path length, sink class)
## Success Criteria
1. Can demonstrate release where overall precision stayed flat but tainted→sink PR-AUC improved
2. On-call noise reduced via tier-gated paging
3. TTFS p95 for tainted→sink within budget
## Related Documentation
- [Ground-Truth Corpus Sprint](../implplan/SPRINT_3500_0003_0001_ground_truth_corpus_ci_gates.md)
- [Scanner Architecture](../modules/scanner/architecture.md)
- [Reachability Analysis](./14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
## Overlap Analysis
This advisory **extends** the ground-truth corpus work (SPRINT_3500_0003_0001) with:
- Tiered precision tracking (new)
- Per-tier operating points (new)
- CI gates based on tier-specific AUC (enhancement)
- Integration with Notify for tier-gated alerts (new)
No contradictions with existing implementations found.