feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration

- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
This commit is contained in:
master
2025-12-17 18:02:37 +02:00
parent 394b57f6bf
commit 8bbfe4d2d2
211 changed files with 47179 additions and 1590 deletions

View File

@@ -0,0 +1,301 @@
# Corpus Contribution Guide
**Sprint:** SPRINT_3500_0003_0001
**Task:** CORPUS-014 - Document corpus contribution guide
## Overview
The Ground-Truth Corpus is a collection of validated test samples used to measure scanner accuracy. Each sample has known reachability status and expected findings, enabling deterministic quality metrics.
## Corpus Structure
```
datasets/reachability/
├── corpus.json # Index of all samples
├── schemas/
│ └── corpus-sample.v1.json # JSON schema for samples
├── samples/
│ ├── gt-0001/ # Sample directory
│ │ ├── sample.json # Sample metadata
│ │ ├── expected.json # Expected findings
│ │ ├── sbom.json # Input SBOM
│ │ └── source/ # Optional source files
│ └── ...
└── baselines/
└── v1.0.0.json # Baseline metrics
```
## Sample Format
### sample.json
```json
{
"id": "gt-0001",
"name": "Python SQL Injection - Reachable",
"description": "Flask app with reachable SQL injection via user input",
"language": "python",
"ecosystem": "pypi",
"scenario": "webapi",
"entrypoints": ["app.py:main"],
"reachability_tier": "tainted_sink",
"created_at": "2025-01-15T00:00:00Z",
"author": "security-team",
"tags": ["sql-injection", "flask", "reachable"]
}
```
### expected.json
```json
{
"findings": [
{
"vuln_key": "CVE-2024-1234:pkg:pypi/sqlalchemy@1.4.0",
"tier": "tainted_sink",
"rule_key": "py.sql.injection.param_concat",
"sink_class": "sql",
"location_hint": "app.py:42"
}
]
}
```
## Contributing a Sample
### Step 1: Choose a Scenario
Select a scenario that is not well-covered in the corpus:
| Scenario | Description | Example |
|----------|-------------|---------|
| `webapi` | Web application endpoint | Flask, FastAPI, Express |
| `cli` | Command-line tool | argparse, click, commander |
| `job` | Background/scheduled job | Celery, cron script |
| `lib` | Library code | Reusable package |
### Step 2: Create Sample Directory
```bash
cd datasets/reachability/samples
mkdir gt-NNNN
cd gt-NNNN
```
Use the next available sample ID (check `corpus.json` for the highest).
### Step 3: Create Minimal Reproducible Case
**Requirements:**
- Smallest possible code to demonstrate the vulnerability
- Real or realistic vulnerability (use CVE when possible)
- Clear entrypoint definition
- Deterministic behavior (no network, no randomness)
**Example Python Sample:**
```python
# app.py - gt-0001
from flask import Flask, request
import sqlite3
app = Flask(__name__)
@app.route("/user")
def get_user():
user_id = request.args.get("id") # Taint source
conn = sqlite3.connect(":memory:")
# SQL injection: user_id flows to query without sanitization
result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}") # Taint sink
return str(result.fetchall())
if __name__ == "__main__":
app.run()
```
### Step 4: Define Expected Findings
Create `expected.json` with all expected findings:
```json
{
"findings": [
{
"vuln_key": "CWE-89:pkg:pypi/flask@2.0.0",
"tier": "tainted_sink",
"rule_key": "py.sql.injection",
"sink_class": "sql",
"location_hint": "app.py:13",
"notes": "User input from request.args flows to sqlite3.execute"
}
]
}
```
### Step 5: Create SBOM
Generate or create an SBOM for the sample:
```json
{
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"version": 1,
"components": [
{
"type": "library",
"name": "flask",
"version": "2.0.0",
"purl": "pkg:pypi/flask@2.0.0"
},
{
"type": "library",
"name": "sqlite3",
"version": "3.39.0",
"purl": "pkg:pypi/sqlite3@3.39.0"
}
]
}
```
### Step 6: Update Corpus Index
Add entry to `corpus.json`:
```json
{
"id": "gt-0001",
"path": "samples/gt-0001",
"language": "python",
"tier": "tainted_sink",
"scenario": "webapi",
"expected_count": 1
}
```
### Step 7: Validate Locally
```bash
# Run corpus validation
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
--filter "FullyQualifiedName~CorpusFixtureTests"
# Run benchmark
stellaops bench corpus run --sample gt-0001 --verbose
```
## Tier Guidelines
### Imported Tier Samples
For `imported` tier samples:
- Vulnerability in a dependency
- No execution path to vulnerable code
- Package is in lockfile but not called
**Example:** Unused dependency with known CVE.
### Executed Tier Samples
For `executed` tier samples:
- Vulnerable code is called from entrypoint
- No user-controlled data reaches the vulnerability
- Static or coverage analysis proves execution
**Example:** Hardcoded SQL query (no injection).
### Tainted→Sink Tier Samples
For `tainted_sink` tier samples:
- User-controlled input reaches vulnerable code
- Clear source → sink data flow
- Include sink class taxonomy
**Example:** User input to SQL query, command execution, etc.
## Sink Classes
When contributing `tainted_sink` samples, specify the sink class:
| Sink Class | Description | Examples |
|------------|-------------|----------|
| `sql` | SQL injection | sqlite3.execute, cursor.execute |
| `command` | Command injection | os.system, subprocess.run |
| `ssrf` | Server-side request forgery | requests.get, urllib.urlopen |
| `path` | Path traversal | open(), os.path.join |
| `deser` | Deserialization | pickle.loads, yaml.load |
| `eval` | Code evaluation | eval(), exec() |
| `xxe` | XML external entity | lxml.parse, ET.parse |
| `xss` | Cross-site scripting | innerHTML, document.write |
## Quality Criteria
Samples must meet these criteria:
- [ ] **Deterministic**: Same input → same output
- [ ] **Minimal**: Smallest code to demonstrate
- [ ] **Documented**: Clear description and notes
- [ ] **Validated**: Passes local tests
- [ ] **Realistic**: Based on real vulnerability patterns
- [ ] **Self-contained**: No external network calls
## Negative Samples
Include "negative" samples where scanner should NOT find vulnerabilities:
```json
{
"id": "gt-0050",
"name": "Python SQL - Properly Sanitized",
"tier": "imported",
"expected_count": 0,
"notes": "Uses parameterized queries, no injection possible"
}
```
## Review Process
1. Create PR with new sample(s)
2. CI runs validation tests
3. Security team reviews expected findings
4. QA team verifies determinism
5. Merge and update baseline
## Updating Baselines
After adding samples, update baseline metrics:
```bash
# Generate new baseline
stellaops bench corpus run --all --output baselines/v1.1.0.json
# Compare to previous
stellaops bench corpus compare baselines/v1.0.0.json baselines/v1.1.0.json
```
## FAQ
### How many samples should I contribute?
Start with 2-3 high-quality samples covering different aspects of the same vulnerability class.
### Can I use synthetic vulnerabilities?
Yes, but prefer real CVE patterns when possible. Synthetic samples should document the vulnerability pattern clearly.
### What if my sample has multiple findings?
Include all expected findings in `expected.json`. Multi-finding samples are valuable for testing.
### How do I test tier classification?
Run with verbose output:
```bash
stellaops bench corpus run --sample gt-NNNN --verbose --show-evidence
```
## Related Documentation
- [Tiered Precision Curves](../benchmarks/tiered-precision-curves.md)
- [Reachability Analysis](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
- [Corpus Index Schema](../../datasets/reachability/schemas/corpus-sample.v1.json)