feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration

- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
This commit is contained in:
master
2025-12-17 18:02:37 +02:00
parent 394b57f6bf
commit 8bbfe4d2d2
211 changed files with 47179 additions and 1590 deletions

View File

@@ -2,7 +2,7 @@
> Aligned with Epic6 Vulnerability Explorer and Epic10 Export Center.
> **Scope.** Implementationready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perlayer caching, threeway diffs, artifact catalog (RustFS default + Mongo, S3-compatible fallback), attestation handoff, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
> **Scope.** Implementationready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perlayer caching, threeway diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation handoff, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
---
@@ -25,7 +25,7 @@ src/
├─ StellaOps.Scanner.WebService/ # REST control plane, catalog, diff, exports
├─ StellaOps.Scanner.Worker/ # queue consumer; executes analyzers
├─ StellaOps.Scanner.Models/ # DTOs, evidence, graph nodes, CDX/SPDX adapters
├─ StellaOps.Scanner.Storage/ # Mongo repositories; RustFS object client (default) + S3 fallback; ILM/GC
├─ StellaOps.Scanner.Storage/ # PostgreSQL repositories; RustFS object client (default) + S3 fallback; ILM/GC
├─ StellaOps.Scanner.Queue/ # queue abstraction (Redis/NATS/RabbitMQ)
├─ StellaOps.Scanner.Cache/ # layer cache; file CAS; bloom/bitmap indexes
├─ StellaOps.Scanner.EntryTrace/ # ENTRYPOINT/CMD → terminal program resolver (shell AST)
@@ -132,7 +132,7 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
* **OCI registry** with **Referrers API** (discover attached SBOMs/signatures).
* **RustFS** (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
* **MongoDB** for catalog, job state, diffs, ILM rules.
* **PostgreSQL** for catalog, job state, diffs, ILM rules.
* **Queue** (Redis Streams/NATS/RabbitMQ).
* **Authority** (onprem OIDC) for **OpToks** (DPoP/mTLS).
* **Signer** + **Attestor** (+ **Fulcio/KMS** + **Rekor v2**) for DSSE + transparency.
@@ -167,7 +167,7 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
No confidences. Either a fact is proven with listed mechanisms, or it is not claimed.
### 3.2 Catalog schema (Mongo)
### 3.2 Catalog schema (PostgreSQL)
* `artifacts`
@@ -182,8 +182,8 @@ No confidences. Either a fact is proven with listed mechanisms, or it is not cla
* `links { fromType, fromDigest, artifactId }` // image/layer -> artifact
* `jobs { _id, kind, args, state, startedAt, heartbeatAt, endedAt, error }`
* `lifecycleRules { ruleId, scope, ttlDays, retainIfReferenced, immutable }`
* `ruby.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `RubyPackageInventory` documents for CLI/Policy reuse
* `bun.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `BunPackageInventory` documents for CLI/Policy reuse
* `ruby.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `RubyPackageInventory` rows for CLI/Policy reuse
* `bun.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `BunPackageInventory` rows for CLI/Policy reuse
### 3.3 Object store layout (RustFS)
@@ -389,8 +389,8 @@ scanner:
queue:
kind: redis
url: "redis://queue:6379/0"
mongo:
uri: "mongodb://mongo/scanner"
postgres:
connectionString: "Host=postgres;Port=5432;Database=scanner;Username=stellaops;Password=stellaops"
s3:
endpoint: "http://minio:9000"
bucket: "stellaops"
@@ -493,7 +493,7 @@ scanner:
* **HA**: WebService horizontal scale; Workers autoscale by queue depth & CPU; distributed locks on layers.
* **Retention**: ILM rules per artifact class (`short`, `default`, `compliance`); **Object Lock** for compliance artifacts (reports, signed SBOMs).
* **Upgrades**: bump **cache schema** when analyzer outputs change; WebService triggers refresh of dependent artifacts.
* **Backups**: Mongo (daily dumps); RustFS snapshots (filesystem-level rsync/ZFS) or S3 versioning when legacy driver enabled; Rekor v2 DB snapshots.
* **Backups**: PostgreSQL (pg_dump daily); RustFS snapshots (filesystem-level rsync/ZFS) or S3 versioning when legacy driver enabled; Rekor v2 DB snapshots.
---

View File

@@ -0,0 +1,357 @@
# EPSS Integration Architecture
> **Advisory Source**: `docs/product-advisories/16-Dec-2025 - Merging EPSS v4 with CVSS v4 Frameworks.md`
> **Last Updated**: 2025-12-17
> **Status**: Approved for Implementation
---
## Executive Summary
EPSS (Exploit Prediction Scoring System) is a **probabilistic model** that estimates the likelihood a given CVE will be exploited in the wild over the next ~30 days. This document defines how StellaOps integrates EPSS as a first-class risk signal.
**Key Distinction**:
- **CVSS v4**: Deterministic measurement of *severity* (0-10)
- **EPSS**: Dynamic, data-driven *probability of exploitation* (0-1)
EPSS does **not** replace CVSS or VEX—it provides complementary probabilistic threat intelligence.
---
## 1. Design Principles
### 1.1 EPSS as Probabilistic Signal
| Signal Type | Nature | Source |
|-------------|--------|--------|
| CVSS v4 | Deterministic impact | NVD, vendor |
| EPSS | Probabilistic threat | FIRST daily feeds |
| VEX | Vendor intent | Vendor statements |
| Runtime context | Actual exposure | StellaOps scanner |
**Rule**: EPSS *modulates confidence*, never asserts truth.
### 1.2 Architectural Constraints
1. **Append-only time-series**: Never overwrite historical EPSS data
2. **Deterministic replay**: Every scan stores the EPSS snapshot reference used
3. **Idempotent ingestion**: Safe to re-run for same date
4. **Postgres as source of truth**: Valkey is optional cache only
5. **Air-gap compatible**: Manual import via signed bundles
---
## 2. Data Model
### 2.1 Core Tables
#### Import Provenance
```sql
CREATE TABLE epss_import_runs (
import_run_id UUID PRIMARY KEY,
model_date DATE NOT NULL,
source_uri TEXT NOT NULL,
retrieved_at TIMESTAMPTZ NOT NULL,
file_sha256 TEXT NOT NULL,
decompressed_sha256 TEXT NULL,
row_count INT NOT NULL,
model_version_tag TEXT NULL,
published_date DATE NULL,
status TEXT NOT NULL, -- SUCCEEDED / FAILED
error TEXT NULL,
UNIQUE (model_date)
);
```
#### Time-Series Scores (Partitioned)
```sql
CREATE TABLE epss_scores (
model_date DATE NOT NULL,
cve_id TEXT NOT NULL,
epss_score DOUBLE PRECISION NOT NULL,
percentile DOUBLE PRECISION NOT NULL,
import_run_id UUID NOT NULL REFERENCES epss_import_runs(import_run_id),
PRIMARY KEY (model_date, cve_id)
) PARTITION BY RANGE (model_date);
```
#### Current Projection (Fast Lookup)
```sql
CREATE TABLE epss_current (
cve_id TEXT PRIMARY KEY,
epss_score DOUBLE PRECISION NOT NULL,
percentile DOUBLE PRECISION NOT NULL,
model_date DATE NOT NULL,
import_run_id UUID NOT NULL
);
CREATE INDEX idx_epss_current_score_desc ON epss_current (epss_score DESC);
CREATE INDEX idx_epss_current_percentile_desc ON epss_current (percentile DESC);
```
#### Change Detection
```sql
CREATE TABLE epss_changes (
model_date DATE NOT NULL,
cve_id TEXT NOT NULL,
old_score DOUBLE PRECISION NULL,
new_score DOUBLE PRECISION NOT NULL,
delta_score DOUBLE PRECISION NULL,
old_percentile DOUBLE PRECISION NULL,
new_percentile DOUBLE PRECISION NOT NULL,
flags INT NOT NULL, -- bitmask: NEW_SCORED, CROSSED_HIGH, BIG_JUMP
PRIMARY KEY (model_date, cve_id)
) PARTITION BY RANGE (model_date);
```
### 2.2 Flags Bitmask
| Flag | Value | Meaning |
|------|-------|---------|
| NEW_SCORED | 0x01 | CVE newly scored (not in previous day) |
| CROSSED_HIGH | 0x02 | Score crossed above high threshold |
| CROSSED_LOW | 0x04 | Score crossed below high threshold |
| BIG_JUMP_UP | 0x08 | Delta > 0.10 upward |
| BIG_JUMP_DOWN | 0x10 | Delta > 0.10 downward |
| TOP_PERCENTILE | 0x20 | Entered top 5% |
---
## 3. Service Architecture
### 3.1 Component Responsibilities
```
┌─────────────────────────────────────────────────────────────────┐
│ EPSS DATA FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scheduler │────►│ Concelier │────►│ Scanner │ │
│ │ (triggers) │ │ (ingest) │ │ (evidence) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Postgres │◄───────────┘ │
│ │ │ (truth) │ │
│ │ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Notify │◄────│ Excititor │ │
│ │ (alerts) │ │ (VEX tasks) │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
| Component | Responsibility |
|-----------|----------------|
| **Scheduler** | Triggers daily EPSS import job |
| **Concelier** | Downloads/imports EPSS, stores facts, computes delta, emits events |
| **Scanner** | Attaches EPSS-at-scan as immutable evidence, uses for scoring |
| **Excititor** | Creates VEX tasks when EPSS is high and VEX missing |
| **Notify** | Sends alerts on priority changes |
### 3.2 Event Flow
```
Scheduler
→ epss.ingest(date)
→ Concelier (ingest)
→ epss.updated
→ Notify (optional daily summary)
→ Concelier (enrichment)
→ vuln.priority.changed
→ Notify (targeted alerts)
→ Excititor (VEX task creation)
```
---
## 4. Ingestion Pipeline
### 4.1 Data Source
FIRST publishes daily CSV snapshots at:
```
https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz
```
Each file contains ~300k CVE records with:
- `cve` - CVE ID
- `epss` - Score (0.000001.00000)
- `percentile` - Rank vs all CVEs
### 4.2 Ingestion Steps
1. **Scheduler** triggers daily job for date D
2. **Download** `epss_scores-D.csv.gz`
3. **Decompress** stream
4. **Parse** header comment for model version/date
5. **Validate** scores in [0,1], monotonic percentile
6. **Bulk load** into TEMP staging table
7. **Transaction**:
- Insert `epss_import_runs`
- Insert into `epss_scores` partition
- Compute `epss_changes` by comparing staging vs `epss_current`
- Upsert `epss_current`
- Enqueue `epss.updated` event
8. **Commit**
### 4.3 Air-Gap Import
Accept local bundle containing:
- `epss_scores-YYYY-MM-DD.csv.gz`
- `manifest.json` with sha256, source attribution, DSSE signature
Same pipeline, with `source_uri = bundle://...`.
---
## 5. Enrichment Rules
### 5.1 New Scan Findings (Immutable)
Store EPSS "as-of" scan time:
```csharp
public record ScanEpssEvidence
{
public double EpssScoreAtScan { get; init; }
public double EpssPercentileAtScan { get; init; }
public DateOnly EpssModelDateAtScan { get; init; }
public Guid EpssImportRunIdAtScan { get; init; }
}
```
This supports deterministic replay even if EPSS changes later.
### 5.2 Existing Findings (Live Triage)
Maintain mutable "current EPSS" on vulnerability instances:
- **scan_finding_evidence**: Immutable EPSS-at-scan
- **vuln_instance_triage**: Current EPSS + band (for live triage)
### 5.3 Efficient Delta Targeting
On `epss.updated(D)`:
1. Read `epss_changes` where flags indicate material change
2. Find impacted vulnerability instances by CVE
3. Update only those instances
4. Emit `vuln.priority.changed` only if band crossed
---
## 6. Notification Policy
### 6.1 Default Thresholds
| Threshold | Default | Description |
|-----------|---------|-------------|
| HighPercentile | 0.95 | Top 5% of all CVEs |
| HighScore | 0.50 | 50% exploitation probability |
| BigJumpDelta | 0.10 | Meaningful daily change |
### 6.2 Trigger Conditions
1. **Newly scored** CVE in inventory AND `percentile >= HighPercentile`
2. Existing CVE **crosses above** HighPercentile or HighScore
3. Delta > BigJumpDelta AND CVE in runtime-exposed assets
All thresholds are org-configurable.
---
## 7. Trust Lattice Integration
### 7.1 Scoring Rule Example
```
IF cvss_base >= 8.0
AND epss_score >= 0.35
AND runtime_exposed = true
→ priority = IMMEDIATE_ATTENTION
```
### 7.2 Score Weights
| Factor | Default Weight | Range |
|--------|---------------|-------|
| CVSS | 0.25 | 0.0-1.0 |
| EPSS | 0.25 | 0.0-1.0 |
| Reachability | 0.25 | 0.0-1.0 |
| Freshness | 0.15 | 0.0-1.0 |
| Frequency | 0.10 | 0.0-1.0 |
---
## 8. API Surface
### 8.1 Internal API Endpoints
| Endpoint | Description |
|----------|-------------|
| `GET /epss/current?cve=...` | Bulk lookup current EPSS |
| `GET /epss/history?cve=...&days=180` | Historical time-series |
| `GET /epss/top?order=epss&limit=100` | Top CVEs by score |
| `GET /epss/changes?date=...` | Daily change report |
### 8.2 UI Requirements
For each vulnerability instance:
- EPSS score + percentile
- Model date
- Trend delta vs previous scan date
- Filter chips: "High EPSS", "Rising EPSS", "High CVSS + High EPSS"
- Evidence panel showing EPSS-at-scan vs current EPSS
---
## 9. Implementation Checklist
### Phase 1: Data Foundation
- [ ] DB migrations: tables + partitions + indexes
- [ ] Concelier ingestion job: online download + bundle import
### Phase 2: Integration
- [ ] epss_current + epss_changes projection
- [ ] Scanner.WebService: attach EPSS-at-scan evidence
- [ ] Bulk lookup API
### Phase 3: Enrichment
- [ ] Concelier enrichment job: update triage projections
- [ ] Notify subscription to vuln.priority.changed
### Phase 4: UI/UX
- [ ] EPSS fields in vulnerability detail
- [ ] Filters and sort by exploit likelihood
- [ ] Trend visualization
### Phase 5: Operations
- [ ] Backfill tool (last 180 days)
- [ ] Ops runbook: schedules, manual re-run, air-gap import
---
## 10. Anti-Patterns to Avoid
| Anti-Pattern | Why It's Wrong |
|--------------|----------------|
| Storing only latest EPSS | Breaks auditability and replay |
| Mixing EPSS into CVE table | EPSS is signal, not vulnerability data |
| Treating EPSS as severity | EPSS is probability, not impact |
| Alerting on every daily fluctuation | Creates alert fatigue |
| Recomputing EPSS internally | Use FIRST's authoritative data |
---
## Related Documents
- [Unknowns API Documentation](../api/unknowns-api.md)
- [Score Replay API](../api/score-replay-api.md)
- [Trust Lattice Architecture](../modules/scanner/architecture.md)