feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration

- Add RateLimitConfig for configuration management with YAML binding support. - Introduce RateLimitDecision to encapsulate the result of rate limit checks. - Implement RateLimitMetrics for OpenTelemetry metrics tracking. - Create RateLimitMiddleware for enforcing rate limits on incoming requests. - Develop RateLimitService to orchestrate instance and environment rate limit checks. - Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00
parent 394b57f6bf
commit 8bbfe4d2d2
211 changed files with 47179 additions and 1590 deletions
--- a/docs/benchmarks/ground-truth-corpus.md
+++ b/docs/benchmarks/ground-truth-corpus.md
@@ -0,0 +1,251 @@
+# Ground-Truth Corpus Specification
+
+> **Version**: 1.0.0  
+> **Last Updated**: 2025-12-17  
+> **Source Advisory**: 16-Dec-2025 - Building a Deeper Moat Beyond Reachability
+
+This document specifies the ground-truth corpus for benchmarking StellaOps' binary-only reachability analysis and deterministic scoring.
+
+---
+
+## Overview
+
+A ground-truth corpus is a curated set of binaries with **known** reachable and unreachable vulnerable sinks. It enables:
+- Precision/recall measurement for reachability claims
+- Regression detection in CI
+- Deterministic replay validation
+
+---
+
+## Corpus Structure
+
+### Sample Requirements
+
+Each sample binary must include:
+- **Manifest file**: `sample.manifest.json` with ground-truth annotations
+- **Binary file**: The target executable (ELF/PE/Mach-O)
+- **Source (optional)**: Original source for reproducibility verification
+
+### Manifest Schema
+
+```json
+{
+  "$schema": "https://stellaops.io/schemas/corpus-sample.v1.json",
+  "sampleId": "gt-0001",
+  "name": "vulnerable-sink-reachable-from-main",
+  "format": "elf64",
+  "arch": "x86_64",
+  "compiler": "gcc-13.2",
+  "compilerFlags": ["-O2", "-fPIE"],
+  "stripped": false,
+  "obfuscation": "none",
+  "pie": true,
+  "cfi": false,
+  "sinks": [
+    {
+      "sinkId": "sink-001",
+      "signature": "vulnerable_function(char*)",
+      "address": "0x401234",
+      "cveId": "CVE-2024-XXXXX",
+      "expected": "reachable",
+      "expectedPaths": [
+        ["main", "process_input", "parse_data", "vulnerable_function"]
+      ],
+      "expectedUnreachableReasons": null
+    },
+    {
+      "sinkId": "sink-002", 
+      "signature": "dead_code_vulnerable()",
+      "address": "0x402000",
+      "cveId": "CVE-2024-YYYYY",
+      "expected": "unreachable",
+      "expectedPaths": null,
+      "expectedUnreachableReasons": ["no-caller", "dead-code-elimination"]
+    }
+  ],
+  "entrypoints": [
+    {"name": "main", "address": "0x401000"},
+    {"name": "_start", "address": "0x400ff0"}
+  ],
+  "metadata": {
+    "createdAt": "2025-12-17T00:00:00Z",
+    "author": "StellaOps QA Guild",
+    "notes": "Basic reachability test with one true positive and one true negative"
+  }
+}
+```
+
+---
+
+## Starter Corpus (20 Samples)
+
+### Category A: Reachable Sinks (10 samples)
+
+| ID | Description | Format | Stripped | Obfuscation | Expected |
+|----|-------------|--------|----------|-------------|----------|
+| gt-0001 | Direct call from main | ELF64 | No | None | Reachable |
+| gt-0002 | Indirect call via function pointer | ELF64 | No | None | Reachable |
+| gt-0003 | Reachable through PLT/GOT | ELF64 | No | None | Reachable |
+| gt-0004 | Reachable via vtable dispatch | ELF64 | No | None | Reachable |
+| gt-0005 | Reachable with stripped symbols | ELF64 | Yes | None | Reachable |
+| gt-0006 | Reachable with partial obfuscation | ELF64 | No | Control-flow | Reachable |
+| gt-0007 | Reachable in PIE binary | ELF64 | No | None | Reachable |
+| gt-0008 | Reachable in ASLR context | ELF64 | No | None | Reachable |
+| gt-0009 | Reachable through shared library | ELF64 | No | None | Reachable |
+| gt-0010 | Reachable via callback registration | ELF64 | No | None | Reachable |
+
+### Category B: Unreachable Sinks (10 samples)
+
+| ID | Description | Format | Stripped | Obfuscation | Expected Reason |
+|----|-------------|--------|----------|-------------|-----------------|
+| gt-0011 | Dead code (never called) | ELF64 | No | None | no-caller |
+| gt-0012 | Guarded by impossible condition | ELF64 | No | None | dead-branch |
+| gt-0013 | Linked but not used | ELF64 | No | None | unused-import |
+| gt-0014 | Behind disabled feature flag | ELF64 | No | None | config-disabled |
+| gt-0015 | Requires privilege escalation | ELF64 | No | None | privilege-gate |
+| gt-0016 | Behind authentication check | ELF64 | No | None | auth-gate |
+| gt-0017 | Unreachable with CFI enabled | ELF64 | No | None | cfi-prevented |
+| gt-0018 | Optimized away by compiler | ELF64 | No | None | dce-eliminated |
+| gt-0019 | In unreachable exception handler | ELF64 | No | None | exception-only |
+| gt-0020 | Test-only code not in production | ELF64 | No | None | test-code-only |
+
+---
+
+## Metrics
+
+### Primary Metrics
+
+| Metric | Definition | Target |
+|--------|------------|--------|
+| **Precision** | TP / (TP + FP) | ≥ 95% |
+| **Recall** | TP / (TP + FN) | ≥ 90% |
+| **F1 Score** | 2 × (Precision × Recall) / (Precision + Recall) | ≥ 92% |
+| **TTFRP** | Time-to-First-Reachable-Path (ms) | p95 < 500ms |
+| **Deterministic Replay** | Identical proofs across runs | 100% |
+
+### Regression Gates
+
+CI gates that **fail the build**:
+- Precision drops > 1.0 percentage point vs baseline
+- Recall drops > 1.0 percentage point vs baseline
+- Deterministic replay drops below 100%
+- TTFRP p95 increases > 20% vs baseline
+
+---
+
+## CI Integration
+
+### Benchmark Job
+
+```yaml
+# .gitea/workflows/reachability-bench.yaml
+name: Reachability Benchmark
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  schedule:
+    - cron: '0 2 * * *'  # Nightly
+
+jobs:
+  benchmark:
+    runs-on: self-hosted
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Run corpus benchmark
+        run: |
+          stellaops bench run \
+            --corpus datasets/reachability/ground-truth/ \
+            --output bench/results/$(date +%Y%m%d).json \
+            --baseline bench/baselines/current.json
+      
+      - name: Check regression gates
+        run: |
+          stellaops bench check \
+            --results bench/results/$(date +%Y%m%d).json \
+            --baseline bench/baselines/current.json \
+            --precision-threshold 0.95 \
+            --recall-threshold 0.90 \
+            --determinism-threshold 1.0
+      
+      - name: Post results to PR
+        if: github.event_name == 'pull_request'
+        run: |
+          stellaops bench report \
+            --results bench/results/$(date +%Y%m%d).json \
+            --baseline bench/baselines/current.json \
+            --format markdown > bench-report.md
+          # Post to PR via API
+```
+
+### Result Schema
+
+```json
+{
+  "runId": "bench-20251217-001",
+  "timestamp": "2025-12-17T02:00:00Z",
+  "corpusVersion": "1.0.0",
+  "scannerVersion": "1.3.0",
+  "metrics": {
+    "precision": 0.96,
+    "recall": 0.91,
+    "f1": 0.935,
+    "ttfrp_p50_ms": 120,
+    "ttfrp_p95_ms": 380,
+    "deterministicReplay": 1.0
+  },
+  "samples": [
+    {
+      "sampleId": "gt-0001",
+      "sinkId": "sink-001",
+      "expected": "reachable",
+      "actual": "reachable",
+      "pathFound": ["main", "process_input", "parse_data", "vulnerable_function"],
+      "proofHash": "sha256:abc123...",
+      "ttfrpMs": 95
+    }
+  ],
+  "regressions": [],
+  "improvements": []
+}
+```
+
+---
+
+## Corpus Maintenance
+
+### Adding New Samples
+
+1. Create sample binary with known sink reachability
+2. Write `sample.manifest.json` with ground-truth annotations
+3. Place in `datasets/reachability/ground-truth/{category}/`
+4. Update corpus version in `datasets/reachability/corpus.json`
+5. Run baseline update: `stellaops bench baseline update`
+
+### Updating Baselines
+
+When scanner improvements are validated:
+```bash
+stellaops bench baseline update \
+  --results bench/results/latest.json \
+  --output bench/baselines/current.json
+```
+
+### Sample Categories
+
+- `basic/` — Simple direct call chains
+- `indirect/` — Function pointers, vtables, callbacks
+- `stripped/` — Symbol-stripped binaries
+- `obfuscated/` — Control-flow obfuscation, packing
+- `guarded/` — Config/auth/privilege guards
+- `multiarch/` — ARM64, x86, RISC-V variants
+
+---
+
+## Related Documentation
+
+- [Reachability Analysis Technical Reference](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
+- [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
+- [Scanner Benchmark Submission Guide](submission-guide.md)
--- a/docs/benchmarks/smart-diff-wii.md
+++ b/docs/benchmarks/smart-diff-wii.md
@@ -0,0 +1,150 @@
+# Smart-Diff Weighted Impact Index (WII)
+
+**Source Advisory:** `docs/product-advisories/unprocessed/16-Dec-2025 - Smart‑Diff Meets Call‑Stack Reachability.md`  
+**Status:** Processed 2025-12-17
+
+## Overview
+
+The Weighted Impact Index (WII) is a composite score (0-100) that combines Smart-Diff semantic analysis with call-stack reachability to measure the runtime risk of code changes. It proves not just "what changed" but "how risky the change is in reachable code."
+
+## Core Concepts
+
+### Inputs
+
+1. **Smart-Diff Output** - Semantic differences between artifact states
+2. **Call Graph** - Symbol nodes with call edges
+3. **Entrypoints** - HTTP routes, jobs, message handlers
+4. **Runtime Heat** - pprof, APM, or eBPF execution frequency data
+5. **Advisory Data** - CVSS v4, EPSS v4 scores
+
+### WII Scoring Model
+
+The WII uses 8 weighted features per diff unit:
+
+| Feature | Weight | Description |
+|---------|--------|-------------|
+| `Δreach_len` | 0.25 | Change in shortest reachable path length |
+| `Δlib_depth` | 0.10 | Change in library call depth |
+| `exposure` | 0.15 | Public/external-facing API |
+| `privilege` | 0.15 | Path crosses privileged sinks |
+| `hot_path` | 0.15 | Frequently executed (runtime evidence) |
+| `cvss_v4` | 0.10 | Normalized CVSS v4 severity |
+| `epss_v4` | 0.10 | Exploit probability |
+| `guard_coverage` | -0.10 | Sanitizers/validations reduce score |
+
+### Determinism Bonus
+
+When `reachability == true` AND (`cvss_v4 > 0.7` OR `epss_v4 > 0.5`), add +5 bonus for "evidence-linked determinism."
+
+### Formula
+
+```
+WII = clamp(0, 1, Σ(w_i × feature_i_normalized)) × 100
+```
+
+## Data Structures
+
+### DiffUnit
+
+```json
+{
+  "unitId": "pkg:npm/lodash@4.17.21#function:merge",
+  "change": "modified",
+  "before": {"hash": "sha256:abc...", "attrs": {}},
+  "after": {"hash": "sha256:def...", "attrs": {}},
+  "features": {
+    "reachable": true,
+    "reachLen": 3,
+    "libDepth": 2,
+    "exposure": true,
+    "privilege": false,
+    "hotPath": true,
+    "cvssV4": 0.75,
+    "epssV4": 0.45,
+    "guardCoverage": false
+  },
+  "wii": 68
+}
+```
+
+### Artifact-Level WII
+
+Two metrics for artifact-level impact:
+- `max(WII_unit)` - Spike impact (single highest risk change)
+- `p95(WII_unit)` - Broad impact (distribution of risk)
+
+## DSSE Attestation
+
+The WII is emitted as a DSSE-signed attestation:
+
+```json
+{
+  "_type": "https://in-toto.io/Statement/v1",
+  "subject": [{"name": "ghcr.io/acme/app:1.9.3", "digest": {"sha256": "..."}}],
+  "predicateType": "https://stella-ops.org/attestations/smart-diff-wii@v1",
+  "predicate": {
+    "artifactBefore": {"digest": {"sha256": "..."}},
+    "artifactAfter": {"digest": {"sha256": "..."}},
+    "evidence": {
+      "sbomBefore": {"digest": {"sha256": "..."}},
+      "sbomAfter": {"digest": {"sha256": "..."}},
+      "callGraph": {"digest": {"sha256": "..."}},
+      "runtimeHeat": {"optional": true, "digest": {"sha256": "..."}}
+    },
+    "units": [...],
+    "aggregateWII": {
+      "max": 85,
+      "p95": 62,
+      "mean": 45
+    }
+  }
+}
+```
+
+## Pipeline Integration
+
+1. **Collect** - Build call graph, import SBOMs, CVE/EPSS data
+2. **Diff** - Run Smart-Diff to generate `DiffUnit[]`
+3. **Enrich** - Query reachability engine per unit
+4. **Score** - Compute per-unit and aggregate WII
+5. **Attest** - Emit DSSE statement with evidence URIs
+6. **Store** - Proof-Market Ledger (Rekor) + PostgreSQL
+
+## Use Cases
+
+### CI/CD Gates
+
+```yaml
+# .github/workflows/security.yml
+- name: Smart-Diff WII Check
+  run: |
+    stellaops smart-diff \
+      --base ${{ env.BASE_IMAGE }} \
+      --target ${{ env.TARGET_IMAGE }} \
+      --wii-threshold 70 \
+      --fail-on-threshold
+```
+
+### Risk Prioritization
+
+Sort changes by WII for review prioritization:
+
+```bash
+stellaops smart-diff show \
+  --sort wii \
+  --format table
+```
+
+### Attestation Verification
+
+```bash
+stellaops verify-attestation \
+  --input smart-diff-wii.json \
+  --predicate-type smart-diff-wii@v1
+```
+
+## Related Documentation
+
+- [Smart-Diff CLI Reference](../cli/smart-diff-cli.md)
+- [Reachability Analysis](./reachability-analysis.md)
+- [DSSE Attestation Format](../api/dsse-format.md)
--- a/docs/benchmarks/tiered-precision-curves.md
+++ b/docs/benchmarks/tiered-precision-curves.md
@@ -0,0 +1,127 @@
+# Tiered Precision Curves for Scanner Accuracy
+
+**Advisory:** 16-Dec-2025 - Measuring Progress with Tiered Precision Curves  
+**Status:** Processing  
+**Related Sprints:** SPRINT_3500_0003_0001 (Ground-Truth Corpus)
+
+## Executive Summary
+
+This advisory introduces a tiered approach to measuring scanner accuracy that prevents metric gaming. By tracking precision/recall separately for three evidence tiers (Imported, Executed, Tainted→Sink), we ensure improvements in one tier don't hide regressions in another.
+
+## Key Concepts
+
+### Evidence Tiers
+
+| Tier | Description | Risk Level | Typical Volume |
+|------|-------------|------------|----------------|
+| **Imported** | Vuln exists in dependency | Lowest | High |
+| **Executed** | Code/deps actually run | Medium | Medium |
+| **Tainted→Sink** | User data reaches sink | Highest | Low |
+
+### Tier Precedence
+
+Highest tier wins when a finding has multiple evidence types:
+1. `tainted_sink` (highest)
+2. `executed`
+3. `imported`
+
+## Implementation Components
+
+### 1. Evidence Schema (`eval` schema)
+
+```sql
+-- Ground truth samples
+eval.sample(sample_id, name, repo_path, commit_sha, language, scenario, entrypoints)
+
+-- Expected findings
+eval.expected_finding(expected_id, sample_id, vuln_key, tier, rule_key, sink_class)
+
+-- Evaluation runs
+eval.run(eval_run_id, scanner_version, rules_hash, concelier_snapshot_hash)
+
+-- Observed results
+eval.observed_finding(observed_id, eval_run_id, sample_id, vuln_key, tier, score, rule_key, evidence)
+
+-- Computed metrics
+eval.metrics(eval_run_id, tier, op_point, precision, recall, f1, pr_auc, latency_p50_ms)
+```
+
+### 2. Scanner Worker Changes
+
+Workers emit evidence primitives:
+- `DependencyEvidence { purl, version, lockfile_path }`
+- `ReachabilityEvidence { entrypoint, call_path[], confidence }`
+- `TaintEvidence { source, sink, sanitizers[], dataflow_path[], confidence }`
+
+### 3. Scanner WebService Changes
+
+WebService performs tiering:
+- Merge evidence for same `vuln_key`
+- Run reachability/taint algorithms
+- Assign `evidence_tier` deterministically
+- Persist normalized findings
+
+### 4. Evaluator CLI
+
+New tool `StellaOps.Scanner.Evaluation.Cli`:
+- `import-corpus` - Load samples and expected findings
+- `run` - Trigger scans using replay manifest
+- `compute` - Calculate per-tier PR curves
+- `report` - Generate markdown artifacts
+
+### 5. CI Gates
+
+Fail builds when:
+- PR-AUC(imported) drops > 2%
+- PR-AUC(executed/tainted_sink) drops > 1%
+- FP rate in `tainted_sink` > 5% at Recall ≥ 0.7
+
+## Operating Points
+
+| Tier | Target Recall | Purpose |
+|------|--------------|---------|
+| `imported` | ≥ 0.60 | Broad coverage |
+| `executed` | ≥ 0.70 | Material risk |
+| `tainted_sink` | ≥ 0.80 | Actionable findings |
+
+## Integration with Existing Systems
+
+### Concelier
+- Stores advisory data, does not tier
+- Tag advisories with sink classes when available
+
+### Excititor (VEX)
+- Include `tier` in VEX statements
+- Allow policy per-tier thresholds
+- Preserve pruning provenance
+
+### Notify
+- Gate alerts on tiered thresholds
+- Page only on `tainted_sink` at operating point
+
+### UI
+- Show tier badge on findings
+- Default sort: tainted_sink > executed > imported
+- Display evidence summary (entrypoint, path length, sink class)
+
+## Success Criteria
+
+1. Can demonstrate release where overall precision stayed flat but tainted→sink PR-AUC improved
+2. On-call noise reduced via tier-gated paging
+3. TTFS p95 for tainted→sink within budget
+
+## Related Documentation
+
+- [Ground-Truth Corpus Sprint](../implplan/SPRINT_3500_0003_0001_ground_truth_corpus_ci_gates.md)
+- [Scanner Architecture](../modules/scanner/architecture.md)
+- [Reachability Analysis](./14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
+
+## Overlap Analysis
+
+This advisory **extends** the ground-truth corpus work (SPRINT_3500_0003_0001) with:
+- Tiered precision tracking (new)
+- Per-tier operating points (new)
+- CI gates based on tier-specific AUC (enhancement)
+- Integration with Notify for tier-gated alerts (new)
+
+No contradictions with existing implementations found.