feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support. - Introduce RateLimitDecision to encapsulate the result of rate limit checks. - Implement RateLimitMetrics for OpenTelemetry metrics tracking. - Create RateLimitMiddleware for enforcing rate limits on incoming requests. - Develop RateLimitService to orchestrate instance and environment rate limit checks. - Add RateLimitServiceCollectionExtensions for dependency injection registration.
This commit is contained in:
251
docs/benchmarks/ground-truth-corpus.md
Normal file
251
docs/benchmarks/ground-truth-corpus.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# Ground-Truth Corpus Specification
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Last Updated**: 2025-12-17
|
||||
> **Source Advisory**: 16-Dec-2025 - Building a Deeper Moat Beyond Reachability
|
||||
|
||||
This document specifies the ground-truth corpus for benchmarking StellaOps' binary-only reachability analysis and deterministic scoring.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
A ground-truth corpus is a curated set of binaries with **known** reachable and unreachable vulnerable sinks. It enables:
|
||||
- Precision/recall measurement for reachability claims
|
||||
- Regression detection in CI
|
||||
- Deterministic replay validation
|
||||
|
||||
---
|
||||
|
||||
## Corpus Structure
|
||||
|
||||
### Sample Requirements
|
||||
|
||||
Each sample binary must include:
|
||||
- **Manifest file**: `sample.manifest.json` with ground-truth annotations
|
||||
- **Binary file**: The target executable (ELF/PE/Mach-O)
|
||||
- **Source (optional)**: Original source for reproducibility verification
|
||||
|
||||
### Manifest Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://stellaops.io/schemas/corpus-sample.v1.json",
|
||||
"sampleId": "gt-0001",
|
||||
"name": "vulnerable-sink-reachable-from-main",
|
||||
"format": "elf64",
|
||||
"arch": "x86_64",
|
||||
"compiler": "gcc-13.2",
|
||||
"compilerFlags": ["-O2", "-fPIE"],
|
||||
"stripped": false,
|
||||
"obfuscation": "none",
|
||||
"pie": true,
|
||||
"cfi": false,
|
||||
"sinks": [
|
||||
{
|
||||
"sinkId": "sink-001",
|
||||
"signature": "vulnerable_function(char*)",
|
||||
"address": "0x401234",
|
||||
"cveId": "CVE-2024-XXXXX",
|
||||
"expected": "reachable",
|
||||
"expectedPaths": [
|
||||
["main", "process_input", "parse_data", "vulnerable_function"]
|
||||
],
|
||||
"expectedUnreachableReasons": null
|
||||
},
|
||||
{
|
||||
"sinkId": "sink-002",
|
||||
"signature": "dead_code_vulnerable()",
|
||||
"address": "0x402000",
|
||||
"cveId": "CVE-2024-YYYYY",
|
||||
"expected": "unreachable",
|
||||
"expectedPaths": null,
|
||||
"expectedUnreachableReasons": ["no-caller", "dead-code-elimination"]
|
||||
}
|
||||
],
|
||||
"entrypoints": [
|
||||
{"name": "main", "address": "0x401000"},
|
||||
{"name": "_start", "address": "0x400ff0"}
|
||||
],
|
||||
"metadata": {
|
||||
"createdAt": "2025-12-17T00:00:00Z",
|
||||
"author": "StellaOps QA Guild",
|
||||
"notes": "Basic reachability test with one true positive and one true negative"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Starter Corpus (20 Samples)
|
||||
|
||||
### Category A: Reachable Sinks (10 samples)
|
||||
|
||||
| ID | Description | Format | Stripped | Obfuscation | Expected |
|
||||
|----|-------------|--------|----------|-------------|----------|
|
||||
| gt-0001 | Direct call from main | ELF64 | No | None | Reachable |
|
||||
| gt-0002 | Indirect call via function pointer | ELF64 | No | None | Reachable |
|
||||
| gt-0003 | Reachable through PLT/GOT | ELF64 | No | None | Reachable |
|
||||
| gt-0004 | Reachable via vtable dispatch | ELF64 | No | None | Reachable |
|
||||
| gt-0005 | Reachable with stripped symbols | ELF64 | Yes | None | Reachable |
|
||||
| gt-0006 | Reachable with partial obfuscation | ELF64 | No | Control-flow | Reachable |
|
||||
| gt-0007 | Reachable in PIE binary | ELF64 | No | None | Reachable |
|
||||
| gt-0008 | Reachable in ASLR context | ELF64 | No | None | Reachable |
|
||||
| gt-0009 | Reachable through shared library | ELF64 | No | None | Reachable |
|
||||
| gt-0010 | Reachable via callback registration | ELF64 | No | None | Reachable |
|
||||
|
||||
### Category B: Unreachable Sinks (10 samples)
|
||||
|
||||
| ID | Description | Format | Stripped | Obfuscation | Expected Reason |
|
||||
|----|-------------|--------|----------|-------------|-----------------|
|
||||
| gt-0011 | Dead code (never called) | ELF64 | No | None | no-caller |
|
||||
| gt-0012 | Guarded by impossible condition | ELF64 | No | None | dead-branch |
|
||||
| gt-0013 | Linked but not used | ELF64 | No | None | unused-import |
|
||||
| gt-0014 | Behind disabled feature flag | ELF64 | No | None | config-disabled |
|
||||
| gt-0015 | Requires privilege escalation | ELF64 | No | None | privilege-gate |
|
||||
| gt-0016 | Behind authentication check | ELF64 | No | None | auth-gate |
|
||||
| gt-0017 | Unreachable with CFI enabled | ELF64 | No | None | cfi-prevented |
|
||||
| gt-0018 | Optimized away by compiler | ELF64 | No | None | dce-eliminated |
|
||||
| gt-0019 | In unreachable exception handler | ELF64 | No | None | exception-only |
|
||||
| gt-0020 | Test-only code not in production | ELF64 | No | None | test-code-only |
|
||||
|
||||
---
|
||||
|
||||
## Metrics
|
||||
|
||||
### Primary Metrics
|
||||
|
||||
| Metric | Definition | Target |
|
||||
|--------|------------|--------|
|
||||
| **Precision** | TP / (TP + FP) | ≥ 95% |
|
||||
| **Recall** | TP / (TP + FN) | ≥ 90% |
|
||||
| **F1 Score** | 2 × (Precision × Recall) / (Precision + Recall) | ≥ 92% |
|
||||
| **TTFRP** | Time-to-First-Reachable-Path (ms) | p95 < 500ms |
|
||||
| **Deterministic Replay** | Identical proofs across runs | 100% |
|
||||
|
||||
### Regression Gates
|
||||
|
||||
CI gates that **fail the build**:
|
||||
- Precision drops > 1.0 percentage point vs baseline
|
||||
- Recall drops > 1.0 percentage point vs baseline
|
||||
- Deterministic replay drops below 100%
|
||||
- TTFRP p95 increases > 20% vs baseline
|
||||
|
||||
---
|
||||
|
||||
## CI Integration
|
||||
|
||||
### Benchmark Job
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/reachability-bench.yaml
|
||||
name: Reachability Benchmark
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
schedule:
|
||||
- cron: '0 2 * * *' # Nightly
|
||||
|
||||
jobs:
|
||||
benchmark:
|
||||
runs-on: self-hosted
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Run corpus benchmark
|
||||
run: |
|
||||
stellaops bench run \
|
||||
--corpus datasets/reachability/ground-truth/ \
|
||||
--output bench/results/$(date +%Y%m%d).json \
|
||||
--baseline bench/baselines/current.json
|
||||
|
||||
- name: Check regression gates
|
||||
run: |
|
||||
stellaops bench check \
|
||||
--results bench/results/$(date +%Y%m%d).json \
|
||||
--baseline bench/baselines/current.json \
|
||||
--precision-threshold 0.95 \
|
||||
--recall-threshold 0.90 \
|
||||
--determinism-threshold 1.0
|
||||
|
||||
- name: Post results to PR
|
||||
if: github.event_name == 'pull_request'
|
||||
run: |
|
||||
stellaops bench report \
|
||||
--results bench/results/$(date +%Y%m%d).json \
|
||||
--baseline bench/baselines/current.json \
|
||||
--format markdown > bench-report.md
|
||||
# Post to PR via API
|
||||
```
|
||||
|
||||
### Result Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"runId": "bench-20251217-001",
|
||||
"timestamp": "2025-12-17T02:00:00Z",
|
||||
"corpusVersion": "1.0.0",
|
||||
"scannerVersion": "1.3.0",
|
||||
"metrics": {
|
||||
"precision": 0.96,
|
||||
"recall": 0.91,
|
||||
"f1": 0.935,
|
||||
"ttfrp_p50_ms": 120,
|
||||
"ttfrp_p95_ms": 380,
|
||||
"deterministicReplay": 1.0
|
||||
},
|
||||
"samples": [
|
||||
{
|
||||
"sampleId": "gt-0001",
|
||||
"sinkId": "sink-001",
|
||||
"expected": "reachable",
|
||||
"actual": "reachable",
|
||||
"pathFound": ["main", "process_input", "parse_data", "vulnerable_function"],
|
||||
"proofHash": "sha256:abc123...",
|
||||
"ttfrpMs": 95
|
||||
}
|
||||
],
|
||||
"regressions": [],
|
||||
"improvements": []
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Corpus Maintenance
|
||||
|
||||
### Adding New Samples
|
||||
|
||||
1. Create sample binary with known sink reachability
|
||||
2. Write `sample.manifest.json` with ground-truth annotations
|
||||
3. Place in `datasets/reachability/ground-truth/{category}/`
|
||||
4. Update corpus version in `datasets/reachability/corpus.json`
|
||||
5. Run baseline update: `stellaops bench baseline update`
|
||||
|
||||
### Updating Baselines
|
||||
|
||||
When scanner improvements are validated:
|
||||
```bash
|
||||
stellaops bench baseline update \
|
||||
--results bench/results/latest.json \
|
||||
--output bench/baselines/current.json
|
||||
```
|
||||
|
||||
### Sample Categories
|
||||
|
||||
- `basic/` — Simple direct call chains
|
||||
- `indirect/` — Function pointers, vtables, callbacks
|
||||
- `stripped/` — Symbol-stripped binaries
|
||||
- `obfuscated/` — Control-flow obfuscation, packing
|
||||
- `guarded/` — Config/auth/privilege guards
|
||||
- `multiarch/` — ARM64, x86, RISC-V variants
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Reachability Analysis Technical Reference](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
|
||||
- [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
|
||||
- [Scanner Benchmark Submission Guide](submission-guide.md)
|
||||
150
docs/benchmarks/smart-diff-wii.md
Normal file
150
docs/benchmarks/smart-diff-wii.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Smart-Diff Weighted Impact Index (WII)
|
||||
|
||||
**Source Advisory:** `docs/product-advisories/unprocessed/16-Dec-2025 - Smart‑Diff Meets Call‑Stack Reachability.md`
|
||||
**Status:** Processed 2025-12-17
|
||||
|
||||
## Overview
|
||||
|
||||
The Weighted Impact Index (WII) is a composite score (0-100) that combines Smart-Diff semantic analysis with call-stack reachability to measure the runtime risk of code changes. It proves not just "what changed" but "how risky the change is in reachable code."
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Inputs
|
||||
|
||||
1. **Smart-Diff Output** - Semantic differences between artifact states
|
||||
2. **Call Graph** - Symbol nodes with call edges
|
||||
3. **Entrypoints** - HTTP routes, jobs, message handlers
|
||||
4. **Runtime Heat** - pprof, APM, or eBPF execution frequency data
|
||||
5. **Advisory Data** - CVSS v4, EPSS v4 scores
|
||||
|
||||
### WII Scoring Model
|
||||
|
||||
The WII uses 8 weighted features per diff unit:
|
||||
|
||||
| Feature | Weight | Description |
|
||||
|---------|--------|-------------|
|
||||
| `Δreach_len` | 0.25 | Change in shortest reachable path length |
|
||||
| `Δlib_depth` | 0.10 | Change in library call depth |
|
||||
| `exposure` | 0.15 | Public/external-facing API |
|
||||
| `privilege` | 0.15 | Path crosses privileged sinks |
|
||||
| `hot_path` | 0.15 | Frequently executed (runtime evidence) |
|
||||
| `cvss_v4` | 0.10 | Normalized CVSS v4 severity |
|
||||
| `epss_v4` | 0.10 | Exploit probability |
|
||||
| `guard_coverage` | -0.10 | Sanitizers/validations reduce score |
|
||||
|
||||
### Determinism Bonus
|
||||
|
||||
When `reachability == true` AND (`cvss_v4 > 0.7` OR `epss_v4 > 0.5`), add +5 bonus for "evidence-linked determinism."
|
||||
|
||||
### Formula
|
||||
|
||||
```
|
||||
WII = clamp(0, 1, Σ(w_i × feature_i_normalized)) × 100
|
||||
```
|
||||
|
||||
## Data Structures
|
||||
|
||||
### DiffUnit
|
||||
|
||||
```json
|
||||
{
|
||||
"unitId": "pkg:npm/lodash@4.17.21#function:merge",
|
||||
"change": "modified",
|
||||
"before": {"hash": "sha256:abc...", "attrs": {}},
|
||||
"after": {"hash": "sha256:def...", "attrs": {}},
|
||||
"features": {
|
||||
"reachable": true,
|
||||
"reachLen": 3,
|
||||
"libDepth": 2,
|
||||
"exposure": true,
|
||||
"privilege": false,
|
||||
"hotPath": true,
|
||||
"cvssV4": 0.75,
|
||||
"epssV4": 0.45,
|
||||
"guardCoverage": false
|
||||
},
|
||||
"wii": 68
|
||||
}
|
||||
```
|
||||
|
||||
### Artifact-Level WII
|
||||
|
||||
Two metrics for artifact-level impact:
|
||||
- `max(WII_unit)` - Spike impact (single highest risk change)
|
||||
- `p95(WII_unit)` - Broad impact (distribution of risk)
|
||||
|
||||
## DSSE Attestation
|
||||
|
||||
The WII is emitted as a DSSE-signed attestation:
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://in-toto.io/Statement/v1",
|
||||
"subject": [{"name": "ghcr.io/acme/app:1.9.3", "digest": {"sha256": "..."}}],
|
||||
"predicateType": "https://stella-ops.org/attestations/smart-diff-wii@v1",
|
||||
"predicate": {
|
||||
"artifactBefore": {"digest": {"sha256": "..."}},
|
||||
"artifactAfter": {"digest": {"sha256": "..."}},
|
||||
"evidence": {
|
||||
"sbomBefore": {"digest": {"sha256": "..."}},
|
||||
"sbomAfter": {"digest": {"sha256": "..."}},
|
||||
"callGraph": {"digest": {"sha256": "..."}},
|
||||
"runtimeHeat": {"optional": true, "digest": {"sha256": "..."}}
|
||||
},
|
||||
"units": [...],
|
||||
"aggregateWII": {
|
||||
"max": 85,
|
||||
"p95": 62,
|
||||
"mean": 45
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Pipeline Integration
|
||||
|
||||
1. **Collect** - Build call graph, import SBOMs, CVE/EPSS data
|
||||
2. **Diff** - Run Smart-Diff to generate `DiffUnit[]`
|
||||
3. **Enrich** - Query reachability engine per unit
|
||||
4. **Score** - Compute per-unit and aggregate WII
|
||||
5. **Attest** - Emit DSSE statement with evidence URIs
|
||||
6. **Store** - Proof-Market Ledger (Rekor) + PostgreSQL
|
||||
|
||||
## Use Cases
|
||||
|
||||
### CI/CD Gates
|
||||
|
||||
```yaml
|
||||
# .github/workflows/security.yml
|
||||
- name: Smart-Diff WII Check
|
||||
run: |
|
||||
stellaops smart-diff \
|
||||
--base ${{ env.BASE_IMAGE }} \
|
||||
--target ${{ env.TARGET_IMAGE }} \
|
||||
--wii-threshold 70 \
|
||||
--fail-on-threshold
|
||||
```
|
||||
|
||||
### Risk Prioritization
|
||||
|
||||
Sort changes by WII for review prioritization:
|
||||
|
||||
```bash
|
||||
stellaops smart-diff show \
|
||||
--sort wii \
|
||||
--format table
|
||||
```
|
||||
|
||||
### Attestation Verification
|
||||
|
||||
```bash
|
||||
stellaops verify-attestation \
|
||||
--input smart-diff-wii.json \
|
||||
--predicate-type smart-diff-wii@v1
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Smart-Diff CLI Reference](../cli/smart-diff-cli.md)
|
||||
- [Reachability Analysis](./reachability-analysis.md)
|
||||
- [DSSE Attestation Format](../api/dsse-format.md)
|
||||
127
docs/benchmarks/tiered-precision-curves.md
Normal file
127
docs/benchmarks/tiered-precision-curves.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# Tiered Precision Curves for Scanner Accuracy
|
||||
|
||||
**Advisory:** 16-Dec-2025 - Measuring Progress with Tiered Precision Curves
|
||||
**Status:** Processing
|
||||
**Related Sprints:** SPRINT_3500_0003_0001 (Ground-Truth Corpus)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This advisory introduces a tiered approach to measuring scanner accuracy that prevents metric gaming. By tracking precision/recall separately for three evidence tiers (Imported, Executed, Tainted→Sink), we ensure improvements in one tier don't hide regressions in another.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Evidence Tiers
|
||||
|
||||
| Tier | Description | Risk Level | Typical Volume |
|
||||
|------|-------------|------------|----------------|
|
||||
| **Imported** | Vuln exists in dependency | Lowest | High |
|
||||
| **Executed** | Code/deps actually run | Medium | Medium |
|
||||
| **Tainted→Sink** | User data reaches sink | Highest | Low |
|
||||
|
||||
### Tier Precedence
|
||||
|
||||
Highest tier wins when a finding has multiple evidence types:
|
||||
1. `tainted_sink` (highest)
|
||||
2. `executed`
|
||||
3. `imported`
|
||||
|
||||
## Implementation Components
|
||||
|
||||
### 1. Evidence Schema (`eval` schema)
|
||||
|
||||
```sql
|
||||
-- Ground truth samples
|
||||
eval.sample(sample_id, name, repo_path, commit_sha, language, scenario, entrypoints)
|
||||
|
||||
-- Expected findings
|
||||
eval.expected_finding(expected_id, sample_id, vuln_key, tier, rule_key, sink_class)
|
||||
|
||||
-- Evaluation runs
|
||||
eval.run(eval_run_id, scanner_version, rules_hash, concelier_snapshot_hash)
|
||||
|
||||
-- Observed results
|
||||
eval.observed_finding(observed_id, eval_run_id, sample_id, vuln_key, tier, score, rule_key, evidence)
|
||||
|
||||
-- Computed metrics
|
||||
eval.metrics(eval_run_id, tier, op_point, precision, recall, f1, pr_auc, latency_p50_ms)
|
||||
```
|
||||
|
||||
### 2. Scanner Worker Changes
|
||||
|
||||
Workers emit evidence primitives:
|
||||
- `DependencyEvidence { purl, version, lockfile_path }`
|
||||
- `ReachabilityEvidence { entrypoint, call_path[], confidence }`
|
||||
- `TaintEvidence { source, sink, sanitizers[], dataflow_path[], confidence }`
|
||||
|
||||
### 3. Scanner WebService Changes
|
||||
|
||||
WebService performs tiering:
|
||||
- Merge evidence for same `vuln_key`
|
||||
- Run reachability/taint algorithms
|
||||
- Assign `evidence_tier` deterministically
|
||||
- Persist normalized findings
|
||||
|
||||
### 4. Evaluator CLI
|
||||
|
||||
New tool `StellaOps.Scanner.Evaluation.Cli`:
|
||||
- `import-corpus` - Load samples and expected findings
|
||||
- `run` - Trigger scans using replay manifest
|
||||
- `compute` - Calculate per-tier PR curves
|
||||
- `report` - Generate markdown artifacts
|
||||
|
||||
### 5. CI Gates
|
||||
|
||||
Fail builds when:
|
||||
- PR-AUC(imported) drops > 2%
|
||||
- PR-AUC(executed/tainted_sink) drops > 1%
|
||||
- FP rate in `tainted_sink` > 5% at Recall ≥ 0.7
|
||||
|
||||
## Operating Points
|
||||
|
||||
| Tier | Target Recall | Purpose |
|
||||
|------|--------------|---------|
|
||||
| `imported` | ≥ 0.60 | Broad coverage |
|
||||
| `executed` | ≥ 0.70 | Material risk |
|
||||
| `tainted_sink` | ≥ 0.80 | Actionable findings |
|
||||
|
||||
## Integration with Existing Systems
|
||||
|
||||
### Concelier
|
||||
- Stores advisory data, does not tier
|
||||
- Tag advisories with sink classes when available
|
||||
|
||||
### Excititor (VEX)
|
||||
- Include `tier` in VEX statements
|
||||
- Allow policy per-tier thresholds
|
||||
- Preserve pruning provenance
|
||||
|
||||
### Notify
|
||||
- Gate alerts on tiered thresholds
|
||||
- Page only on `tainted_sink` at operating point
|
||||
|
||||
### UI
|
||||
- Show tier badge on findings
|
||||
- Default sort: tainted_sink > executed > imported
|
||||
- Display evidence summary (entrypoint, path length, sink class)
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. Can demonstrate release where overall precision stayed flat but tainted→sink PR-AUC improved
|
||||
2. On-call noise reduced via tier-gated paging
|
||||
3. TTFS p95 for tainted→sink within budget
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Ground-Truth Corpus Sprint](../implplan/SPRINT_3500_0003_0001_ground_truth_corpus_ci_gates.md)
|
||||
- [Scanner Architecture](../modules/scanner/architecture.md)
|
||||
- [Reachability Analysis](./14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
|
||||
|
||||
## Overlap Analysis
|
||||
|
||||
This advisory **extends** the ground-truth corpus work (SPRINT_3500_0003_0001) with:
|
||||
- Tiered precision tracking (new)
|
||||
- Per-tier operating points (new)
|
||||
- CI gates based on tier-specific AUC (enhancement)
|
||||
- Integration with Notify for tier-gated alerts (new)
|
||||
|
||||
No contradictions with existing implementations found.
|
||||
Reference in New Issue
Block a user