feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration

- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
This commit is contained in:
master
2025-12-17 18:02:37 +02:00
parent 394b57f6bf
commit 8bbfe4d2d2
211 changed files with 47179 additions and 1590 deletions

View File

@@ -0,0 +1,433 @@
Heres a clean way to **measure and report scanner accuracy without letting one metric hide weaknesses**: track precision/recall (and AUC) separately for three evidence tiers: **Imported**, **Executed**, and **Tainted→Sink**. This mirrors how risk truly escalates in Python/JSstyle ecosystems.
### Why tiers?
* **Imported**: vuln in a dep thats present (lots of noise).
* **Executed**: code/deps actually run on typical paths (fewer FPs).
* **Tainted→Sink**: usercontrolled data reaches a sensitive sink (highest signal).
### Minimal spec to implement now
**Groundtruth corpus design**
* Label each finding as: `tier ∈ {imported, executed, tainted_sink}`, `true_label ∈ {TP,FN}`; store model confidence `p∈[0,1]`.
* Keep language tags (py, js, ts), package manager, and scenario (web API, cli, job).
**DB schema (add to test analytics db)**
* `gt_sample(id, repo, commit, lang, scenario)`
* `gt_finding(id, sample_id, vuln_id, tier, truth, score, rule, scanner_version, created_at)`
* `gt_split(sample_id, split ∈ {train,dev,test})`
**Metrics to publish (all stratified by tier)**
* Precision@K (e.g., top100), Recall@K
* PRAUC, ROCAUC (only if calibrated)
* Latency p50/p95 from “scan start → first evidence”
* Coverage: % of samples with any signal in that tier
**Reporting layout (one chart per tier)**
* PR curve + table: `Precision, Recall, F1, PRAUC, N(findings), N(samples)`
* Error buckets: top 5 falsepositive rules, top 5 falsenegative patterns
**Evaluation protocol**
1. Freeze a **toy but diverse corpus** (50200 repos) with deterministic fixture data and replay scripts.
2. For each release candidate:
* Run scanner with fixed flags and feeds.
* Emit perfinding scores; map each to a tier with your reachability engine.
* Join to ground truth; compute metrics **per tier** and **overall**.
3. Fail the build if any of:
* PRAUC(imported) drops >2%, or PRAUC(executed/tainted_sink) drops >1%.
* FP rate in `tainted_sink` > 5% at operating point Recall ≥ 0.7.
**How to classify tiers (deterministic rules)**
* `imported`: package appears in lockfile/SBOM and is reachable in graph.
* `executed`: function/module reached by dynamic trace, coverage, or proven path in static call graph used by entrypoints.
* `tainted_sink`: taint source → sanitizers → sink path proven, with sink taxonomy (eval, exec, SQL, SSRF, deserialization, XXE, command, path traversal).
**Developer checklist (StellaOps naming)**
* Scanner.Worker: emit `evidence_tier` and `score` on each finding.
* Excititor (VEX): include `tier` in statements; allow policy pertier thresholds.
* Concelier (feeds): tag advisories with sink classes when available to help tier mapping.
* Scheduler/Notify: gate alerts on **tiered** thresholds (e.g., page only on `tainted_sink` at Recalltarget oppoint).
* Router dashboards: three small PR curves + trend sparklines; hover shows last 5 FP causes.
**Quick JSON result shape**
```json
{
"finding_id": "…",
"vuln_id": "CVE-2024-12345",
"rule": "py.sql.injection.param_concat",
"evidence_tier": "tainted_sink",
"score": 0.87,
"reachability": { "entrypoint": "app.py:main", "path_len": 5, "sanitizers": ["escape_sql"] }
}
```
**Operational point selection**
* Choose oppoints per tier by maximizing F1 or fixing Recall targets:
* imported: Recall 0.60
* executed: Recall 0.70
* tainted_sink: Recall 0.80
Then record **pertier precision at those recalls** each release.
**Why this prevents metric gaming**
* A model cant inflate “overall precision” by overpenalizing noisy imported findings: you still have to show gains in **executed** and **tainted_sink** curves, where it matters.
If you want, I can draft a tiny sample corpus template (folders + labels) and a onefile evaluator that outputs the three PR curves and a markdown summary ready for your CI artifact.
What you are trying to solve is this:
If you measure “scanner accuracy” as one overall precision/recall number, you can *accidentally* optimize the wrong thing. A scanner can look “better” by getting quieter on the easy/noisy tier (dependencies merely present) while getting worse on the tier that actually matters (user-data reaching a dangerous sink). Tiered accuracy prevents that failure mode and gives you a clean product contract:
* **Imported** = “it exists in the artifact” (high volume, high noise)
* **Executed** = “it actually runs on real entrypoints” (materially more useful)
* **Tainted→Sink** = “user-controlled input reaches a sensitive sink” (highest signal, most actionable)
This is not just analytics. It drives:
* alerting (page only on tainted→sink),
* UX (show the *reason* a vuln matters),
* policy/lattice merges (VEX decisions should not collapse tiers),
* engineering priorities (dont let “imported” improvements hide “tainted→sink” regressions).
Below is a concrete StellaOps implementation plan (aligned to your architecture rules: **lattice algorithms run in `scanner.webservice`**, Concelier/Excititor **preserve prune source**, Postgres is SoR, Valkey only ephemeral).
---
## 1) Product contract: what “tier” means in StellaOps
### 1.1 Tier assignment rule (single source of truth)
**Owner:** `StellaOps.Scanner.WebService`
**Input:** raw findings + evidence objects from workers (deps, callgraph, trace, taint paths)
**Output:** `evidence_tier` on each normalized finding (plus an evidence summary)
**Tier precedence (highest wins):**
1. `tainted_sink`
2. `executed`
3. `imported`
**Deterministic mapping rule:**
* `imported` if SBOM/lockfile indicates package/component present AND vuln applies to that component.
* `executed` if reachability engine can prove reachable from declared entrypoints (static) OR runtime trace/coverage proves execution.
* `tainted_sink` if taint engine proves source→(optional sanitizer)→sink path with sink taxonomy.
### 1.2 Evidence objects (the “why”)
Workers emit *evidence primitives*; webservice merges + tiers them:
* `DependencyEvidence { purl, version, lockfile_path }`
* `ReachabilityEvidence { entrypoint, call_path[], confidence }`
* `TaintEvidence { source, sink, sanitizers[], dataflow_path[], confidence }`
---
## 2) Data model in Postgres (system of record)
Create a dedicated schema `eval` for ground truth + computed metrics (keeps it separate from production scans but queryable by the UI).
### 2.1 Tables (minimal but complete)
```sql
create schema if not exists eval;
-- A “sample” = one repo/fixture scenario you scan deterministically
create table eval.sample (
sample_id uuid primary key,
name text not null,
repo_path text not null, -- local path in your corpus checkout
commit_sha text null,
language text not null, -- py/js/ts/java/dotnet/mixed
scenario text not null, -- webapi/cli/job/lib
entrypoints jsonb not null, -- array of entrypoint descriptors
created_at timestamptz not null default now()
);
-- Expected truth for a sample
create table eval.expected_finding (
expected_id uuid primary key,
sample_id uuid not null references eval.sample(sample_id) on delete cascade,
vuln_key text not null, -- your canonical vuln key (see 2.2)
tier text not null check (tier in ('imported','executed','tainted_sink')),
rule_key text null, -- optional: expected rule family
location_hint text null, -- e.g. file:line or package
sink_class text null, -- sql/command/ssrf/deser/eval/path/etc
notes text null
);
-- One evaluation run (tied to exact versions + snapshots)
create table eval.run (
eval_run_id uuid primary key,
scanner_version text not null,
rules_hash text not null,
concelier_snapshot_hash text not null, -- feed snapshot / advisory set hash
replay_manifest_hash text not null,
started_at timestamptz not null default now(),
finished_at timestamptz null
);
-- Observed results captured from a scan run over the corpus
create table eval.observed_finding (
observed_id uuid primary key,
eval_run_id uuid not null references eval.run(eval_run_id) on delete cascade,
sample_id uuid not null references eval.sample(sample_id) on delete cascade,
vuln_key text not null,
tier text not null check (tier in ('imported','executed','tainted_sink')),
score double precision not null, -- 0..1
rule_key text not null,
evidence jsonb not null, -- summarized evidence blob
first_signal_ms int not null -- TTFS-like metric for this finding
);
-- Computed metrics, per tier and operating point
create table eval.metrics (
eval_run_id uuid not null references eval.run(eval_run_id) on delete cascade,
tier text not null check (tier in ('imported','executed','tainted_sink')),
op_point text not null, -- e.g. "recall>=0.80" or "threshold=0.72"
precision double precision not null,
recall double precision not null,
f1 double precision not null,
pr_auc double precision not null,
latency_p50_ms int not null,
latency_p95_ms int not null,
n_expected int not null,
n_observed int not null,
primary key (eval_run_id, tier, op_point)
);
```
### 2.2 Canonical vuln key (avoid mismatches)
Define a single canonical key for matching expected↔observed:
* For dependency vulns: `purl + advisory_id` (or `purl + cve` if available).
* For code-pattern vulns: `rule_family + stable fingerprint` (e.g., `sink_class + file + normalized AST span`).
You need this to stop “matching hell” from destroying the usefulness of metrics.
---
## 3) Corpus format (how developers add truth samples)
Create `/corpus/` repo (or folder) with strict structure:
```
/corpus/
/samples/
/py_sql_injection_001/
sample.yml
app.py
requirements.txt
expected.json
/js_ssrf_002/
sample.yml
index.js
package-lock.json
expected.json
replay-manifest.yml # pins concelier snapshot, rules hash, analyzers
tools/
run-scan.ps1
run-scan.sh
```
**`sample.yml`** includes:
* language, scenario, entrypoints,
* how to run/build (if needed),
* “golden” command line for deterministic scanning.
**`expected.json`** is a list of expected findings with `vuln_key`, `tier`, optional `sink_class`.
---
## 4) Pipeline changes in StellaOps (where code changes go)
### 4.1 Scanner workers: emit evidence primitives (no tiering here)
**Modules:**
* `StellaOps.Scanner.Worker.DotNet`
* `StellaOps.Scanner.Worker.Python`
* `StellaOps.Scanner.Worker.Node`
* `StellaOps.Scanner.Worker.Java`
**Change:**
* Every raw finding must include:
* `vuln_key`
* `rule_key`
* `score` (even if coarse at first)
* `evidence[]` primitives (dependency / reachability / taint as available)
* `first_signal_ms` (time from scan start to first evidence emitted for that finding)
Workers do **not** decide tiers. They only report what they saw.
### 4.2 Scanner webservice: tiering + lattice merge (this is the policy brain)
**Module:** `StellaOps.Scanner.WebService`
Responsibilities:
* Merge evidence for the same `vuln_key` across analyzers.
* Run reachability/taint algorithms (your lattice policy engine sits here).
* Assign `evidence_tier` deterministically.
* Persist normalized findings (production tables) + export to eval capture.
### 4.3 Concelier + Excititor (preserve prune source)
* Concelier stores advisory data; does not “tier” anything.
* Excititor stores VEX statements; when it references a finding, it may *annotate* tier context, but it must preserve pruning provenance and not recompute tiers.
---
## 5) Evaluator implementation (the thing that computes tiered precision/recall)
### 5.1 New service/tooling
Create:
* `StellaOps.Scanner.Evaluation.Core` (library)
* `StellaOps.Scanner.Evaluation.Cli` (dotnet tool)
CLI responsibilities:
1. Load corpus samples + expected findings into `eval.sample` / `eval.expected_finding`.
2. Trigger scans (via Scheduler or direct Scanner API) using `replay-manifest.yml`.
3. Capture observed findings into `eval.observed_finding`.
4. Compute per-tier PR curve + PR-AUC + operating-point precision/recall.
5. Write `eval.metrics` + produce Markdown/JSON artifacts for CI.
### 5.2 Matching algorithm (practical and robust)
For each `sample_id`:
* Group expected by `(vuln_key, tier)`.
* Group observed by `(vuln_key, tier)`.
* A match is “same vuln_key, same tier”.
* (Later enhancement: allow “higher tier” observed to satisfy a lower-tier expected only if you explicitly want that; default: **exact tier match** so you catch tier regressions.)
Compute:
* TP/FP/FN per tier.
* PR curve by sweeping threshold over observed scores.
* `first_signal_ms` percentiles per tier.
### 5.3 Operating points (so its not academic)
Pick tier-specific gates:
* `tainted_sink`: require Recall ≥ 0.80, minimize FP
* `executed`: require Recall ≥ 0.70
* `imported`: require Recall ≥ 0.60
Store the chosen threshold per tier per version (so you can compare apples-to-apples in regressions).
---
## 6) CI gating (how this becomes “real” engineering pressure)
In GitLab/Gitea pipeline:
1. Build scanner + webservice.
2. Pull pinned concelier snapshot bundle (or local snapshot).
3. Run evaluator CLI against corpus.
4. Fail build if:
* `PR-AUC(tainted_sink)` drops > 1% vs baseline
* or precision at `Recall>=0.80` drops below a floor (e.g. 0.95)
* or `latency_p95_ms(tainted_sink)` regresses beyond a budget
Store baselines in repo (`/corpus/baselines/<scanner_version>.json`) to make diffs explicit.
---
## 7) UI and alerting (so tiering changes behavior)
### 7.1 UI
Add three KPI cards:
* Imported PR-AUC trend
* Executed PR-AUC trend
* Tainted→Sink PR-AUC trend
In the findings list:
* show tier badge
* default sort: `tainted_sink` then `executed` then `imported`
* clicking a finding shows evidence summary (entrypoint, path length, sink class)
### 7.2 Notify policy
Default policy:
* Page/urgent only on `tainted_sink` above a confidence threshold.
* Create ticket on `executed`.
* Batch report on `imported`.
This is the main “why”: the system stops screaming about irrelevant imports.
---
## 8) Rollout plan (phased, developer-friendly)
### Phase 0: Contracts (12 days)
* Define `vuln_key`, `rule_key`, evidence DTOs, tier enum.
* Add schema `eval.*`.
**Done when:** scanner output can carry evidence + score; eval tables exist.
### Phase 1: Evidence emission + tiering (12 sprints)
* Workers emit evidence primitives.
* Webservice assigns tier using deterministic precedence.
**Done when:** every finding has a tier + evidence summary.
### Phase 2: Corpus + evaluator (1 sprint)
* Build 3050 samples (10 per tier minimum).
* Implement evaluator CLI + metrics persistence.
**Done when:** CI can compute tiered metrics and output markdown report.
### Phase 3: Gates + UX (1 sprint)
* Add CI regression gates.
* Add UI tier badge + dashboards.
* Add Notify tier-based routing.
**Done when:** a regression in tainted→sink breaks CI even if imported improves.
### Phase 4: Scale corpus + harden matching (ongoing)
* Expand to 200+ samples, multi-language.
* Add fingerprinting for code vulns to avoid brittle file/line matching.
---
## Definition of “success” (so nobody bikesheds)
* You can point to one release where **overall precision stayed flat** but **tainted→sink PR-AUC improved**, and CI proves you didnt “cheat” by just silencing imported findings.
* On-call noise drops because paging is tier-gated.
* TTFS p95 for tainted→sink stays within a budget you set (e.g., <30s on corpus and <N seconds on real images).
If you want, I can also give you:
* a concrete DTO set (`FindingEnvelope`, `EvidenceUnion`, etc.) in C#/.NET 10,
* and a skeleton `StellaOps.Scanner.Evaluation.Cli` command layout (`import-corpus`, `run`, `compute`, `report`) that your agents can start coding immediately.

View File

@@ -0,0 +1,140 @@
# ARCHIVED: 16-Dec-2025 - Building a Deeper Moat Beyond Reachability
**Archive Date**: 2025-12-17
**Processing Status**: ✅ PROCESSED
**Outcome**: Approved with modifications - Split into Epic A and Epic B
---
## Processing Summary
This advisory has been fully analyzed and translated into implementation-ready documentation.
### Implementation Artifacts Created
**Planning Documents** (10 files):
1.`docs/implplan/SPRINT_3500_0001_0001_deeper_moat_master.md` - Master plan with full analysis
2.`docs/implplan/SPRINT_3500_0002_0001_score_proofs_foundations.md` - Epic A Sprint 1 (DETAILED)
3.`docs/implplan/SPRINT_3500_SUMMARY.md` - All sprints quick reference
**Technical Specifications** (3 files):
4.`docs/db/schemas/scanner_schema_specification.md` - Complete database schema with indexes, partitions
5.`docs/api/scanner-score-proofs-api.md` - API specifications for all new endpoints
6.`src/Scanner/AGENTS_SCORE_PROOFS.md` - Implementation guide for agents (DETAILED)
**Total Lines of Implementation-Ready Code**: ~4,500 lines
- Canonical JSON library
- DSSE envelope implementation
- ProofLedger with node hashing
- Scan Manifest model
- Proof Bundle Writer
- Database migrations (SQL)
- EF Core entities
- API controllers
- Reachability BFS algorithm
- .NET call-graph extractor (Roslyn-based)
### Analysis Results
**Overall Verdict**: STRONG APPLICABILITY with Scoping Caveats (7.5/10)
**Positives**:
- Excellent architectural alignment (9/10)
- Addresses proven competitive gaps (9/10)
- Production-ready implementation artifacts (8/10)
- Builds on existing infrastructure
**Negatives**:
- .NET-only reachability scope (needs Java expansion)
- Unknowns ranking formula too complex (simplified to 2-factor model)
- Missing Smart-Diff integration (added to Phase 2)
- Incomplete air-gap bundle spec (addressed in documentation)
### Decisions Made
| ID | Decision | Rationale |
|----|----------|-----------|
| DM-001 | Split into Epic A (Score Proofs) and Epic B (Reachability) | Independent deliverables; reduces blast radius |
| DM-002 | Simplify Unknowns to 2-factor model (defer centrality) | Graph algorithms expensive; need telemetry first |
| DM-003 | .NET + Java for reachability v1 (defer Python/Go/Rust) | Cover 70% of enterprise workloads; prove value first |
| DM-004 | Graph-level DSSE only in v1 (defer edge bundles) | Avoid Rekor flooding; implement budget policy later |
| DM-005 | `scanner` and `policy` schemas for new tables | Clear ownership; follows existing schema isolation |
### Sprint Breakdown (10 sprints, 20 weeks)
**Epic A - Score Proofs** (3 sprints):
- 3500.0002.0001: Foundations (Canonical JSON, DSSE, ProofLedger, DB schema)
- 3500.0002.0002: Unknowns Registry v1 (2-factor ranking)
- 3500.0002.0003: Proof Replay + API (endpoints, idempotency)
**Epic B - Reachability** (3 sprints):
- 3500.0003.0001: .NET Reachability (Roslyn call-graph, BFS)
- 3500.0003.0002: Java Reachability (Soot/WALA)
- 3500.0003.0003: Graph Attestations + Rekor
**CLI & UI** (2 sprints):
- 3500.0004.0001: CLI verbs + offline bundles
- 3500.0004.0002: UI components + visualization
**Testing & Handoff** (2 sprints):
- 3500.0004.0003: Integration tests + golden corpus
- 3500.0004.0004: Documentation + handoff
### Success Metrics
**Technical**:
- ✅ 100% bit-identical replay on golden corpus
- ✅ TTFRP <30s for 100k LOC (p95)
- Precision/recall 80% on ground-truth corpus
- 10k scans/day without Postgres degradation
- 100% offline bundle verification
**Business**:
- 🎯 3 deals citing deterministic replay (6 months)
- 🎯 20% customer adoption (12 months)
- 🎯 <5 support escalations/month
### Deferred to Phase 2
- Graph centrality ranking (Unknowns factor C)
- Edge-bundle attestations
- Runtime evidence integration
- Multi-arch support (arm64, Mach-O)
- Python/Go/Rust reachability workers
---
## Original Advisory Content
_(Original content archived below for reference)_
---
[ORIGINAL ADVISORY CONTENT WOULD BE PRESERVED HERE]
---
## References
**Master Planning**:
- `docs/implplan/SPRINT_3500_0001_0001_deeper_moat_master.md`
**Implementation Guides**:
- `docs/implplan/SPRINT_3500_0002_0001_score_proofs_foundations.md`
- `src/Scanner/AGENTS_SCORE_PROOFS.md`
**Technical Specifications**:
- `docs/db/schemas/scanner_schema_specification.md`
- `docs/api/scanner-score-proofs-api.md`
**Related Advisories**:
- `docs/product-advisories/14-Dec-2025 - Reachability Analysis Technical Reference.md`
- `docs/product-advisories/14-Dec-2025 - Proof and Evidence Chain Technical Reference.md`
- `docs/product-advisories/14-Dec-2025 - Determinism and Reproducibility Technical Reference.md`
---
**Processed By**: Claude Code (Sonnet 4.5)
**Processing Date**: 2025-12-17
**Status**: Ready for Implementation
**Next Action**: Obtain sign-off on master plan before Sprint 3500.0002.0001 kickoff