feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support. - Introduce RateLimitDecision to encapsulate the result of rate limit checks. - Implement RateLimitMetrics for OpenTelemetry metrics tracking. - Create RateLimitMiddleware for enforcing rate limits on incoming requests. - Develop RateLimitService to orchestrate instance and environment rate limit checks. - Add RateLimitServiceCollectionExtensions for dependency injection registration.
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
> Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.
|
||||
|
||||
> **Scope.** Implementation‑ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + Mongo, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
|
||||
> **Scope.** Implementation‑ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
|
||||
|
||||
---
|
||||
|
||||
@@ -25,7 +25,7 @@ src/
|
||||
├─ StellaOps.Scanner.WebService/ # REST control plane, catalog, diff, exports
|
||||
├─ StellaOps.Scanner.Worker/ # queue consumer; executes analyzers
|
||||
├─ StellaOps.Scanner.Models/ # DTOs, evidence, graph nodes, CDX/SPDX adapters
|
||||
├─ StellaOps.Scanner.Storage/ # Mongo repositories; RustFS object client (default) + S3 fallback; ILM/GC
|
||||
├─ StellaOps.Scanner.Storage/ # PostgreSQL repositories; RustFS object client (default) + S3 fallback; ILM/GC
|
||||
├─ StellaOps.Scanner.Queue/ # queue abstraction (Redis/NATS/RabbitMQ)
|
||||
├─ StellaOps.Scanner.Cache/ # layer cache; file CAS; bloom/bitmap indexes
|
||||
├─ StellaOps.Scanner.EntryTrace/ # ENTRYPOINT/CMD → terminal program resolver (shell AST)
|
||||
@@ -132,7 +132,7 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
|
||||
|
||||
* **OCI registry** with **Referrers API** (discover attached SBOMs/signatures).
|
||||
* **RustFS** (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
|
||||
* **MongoDB** for catalog, job state, diffs, ILM rules.
|
||||
* **PostgreSQL** for catalog, job state, diffs, ILM rules.
|
||||
* **Queue** (Redis Streams/NATS/RabbitMQ).
|
||||
* **Authority** (on‑prem OIDC) for **OpToks** (DPoP/mTLS).
|
||||
* **Signer** + **Attestor** (+ **Fulcio/KMS** + **Rekor v2**) for DSSE + transparency.
|
||||
@@ -167,7 +167,7 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
|
||||
|
||||
No confidences. Either a fact is proven with listed mechanisms, or it is not claimed.
|
||||
|
||||
### 3.2 Catalog schema (Mongo)
|
||||
### 3.2 Catalog schema (PostgreSQL)
|
||||
|
||||
* `artifacts`
|
||||
|
||||
@@ -182,8 +182,8 @@ No confidences. Either a fact is proven with listed mechanisms, or it is not cla
|
||||
* `links { fromType, fromDigest, artifactId }` // image/layer -> artifact
|
||||
* `jobs { _id, kind, args, state, startedAt, heartbeatAt, endedAt, error }`
|
||||
* `lifecycleRules { ruleId, scope, ttlDays, retainIfReferenced, immutable }`
|
||||
* `ruby.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `RubyPackageInventory` documents for CLI/Policy reuse
|
||||
* `bun.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `BunPackageInventory` documents for CLI/Policy reuse
|
||||
* `ruby.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `RubyPackageInventory` rows for CLI/Policy reuse
|
||||
* `bun.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] }` // decoded `BunPackageInventory` rows for CLI/Policy reuse
|
||||
|
||||
### 3.3 Object store layout (RustFS)
|
||||
|
||||
@@ -389,8 +389,8 @@ scanner:
|
||||
queue:
|
||||
kind: redis
|
||||
url: "redis://queue:6379/0"
|
||||
mongo:
|
||||
uri: "mongodb://mongo/scanner"
|
||||
postgres:
|
||||
connectionString: "Host=postgres;Port=5432;Database=scanner;Username=stellaops;Password=stellaops"
|
||||
s3:
|
||||
endpoint: "http://minio:9000"
|
||||
bucket: "stellaops"
|
||||
@@ -493,7 +493,7 @@ scanner:
|
||||
* **HA**: WebService horizontal scale; Workers autoscale by queue depth & CPU; distributed locks on layers.
|
||||
* **Retention**: ILM rules per artifact class (`short`, `default`, `compliance`); **Object Lock** for compliance artifacts (reports, signed SBOMs).
|
||||
* **Upgrades**: bump **cache schema** when analyzer outputs change; WebService triggers refresh of dependent artifacts.
|
||||
* **Backups**: Mongo (daily dumps); RustFS snapshots (filesystem-level rsync/ZFS) or S3 versioning when legacy driver enabled; Rekor v2 DB snapshots.
|
||||
* **Backups**: PostgreSQL (pg_dump daily); RustFS snapshots (filesystem-level rsync/ZFS) or S3 versioning when legacy driver enabled; Rekor v2 DB snapshots.
|
||||
|
||||
---
|
||||
|
||||
|
||||
357
docs/modules/scanner/epss-integration.md
Normal file
357
docs/modules/scanner/epss-integration.md
Normal file
@@ -0,0 +1,357 @@
|
||||
# EPSS Integration Architecture
|
||||
|
||||
> **Advisory Source**: `docs/product-advisories/16-Dec-2025 - Merging EPSS v4 with CVSS v4 Frameworks.md`
|
||||
> **Last Updated**: 2025-12-17
|
||||
> **Status**: Approved for Implementation
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
EPSS (Exploit Prediction Scoring System) is a **probabilistic model** that estimates the likelihood a given CVE will be exploited in the wild over the next ~30 days. This document defines how StellaOps integrates EPSS as a first-class risk signal.
|
||||
|
||||
**Key Distinction**:
|
||||
- **CVSS v4**: Deterministic measurement of *severity* (0-10)
|
||||
- **EPSS**: Dynamic, data-driven *probability of exploitation* (0-1)
|
||||
|
||||
EPSS does **not** replace CVSS or VEX—it provides complementary probabilistic threat intelligence.
|
||||
|
||||
---
|
||||
|
||||
## 1. Design Principles
|
||||
|
||||
### 1.1 EPSS as Probabilistic Signal
|
||||
|
||||
| Signal Type | Nature | Source |
|
||||
|-------------|--------|--------|
|
||||
| CVSS v4 | Deterministic impact | NVD, vendor |
|
||||
| EPSS | Probabilistic threat | FIRST daily feeds |
|
||||
| VEX | Vendor intent | Vendor statements |
|
||||
| Runtime context | Actual exposure | StellaOps scanner |
|
||||
|
||||
**Rule**: EPSS *modulates confidence*, never asserts truth.
|
||||
|
||||
### 1.2 Architectural Constraints
|
||||
|
||||
1. **Append-only time-series**: Never overwrite historical EPSS data
|
||||
2. **Deterministic replay**: Every scan stores the EPSS snapshot reference used
|
||||
3. **Idempotent ingestion**: Safe to re-run for same date
|
||||
4. **Postgres as source of truth**: Valkey is optional cache only
|
||||
5. **Air-gap compatible**: Manual import via signed bundles
|
||||
|
||||
---
|
||||
|
||||
## 2. Data Model
|
||||
|
||||
### 2.1 Core Tables
|
||||
|
||||
#### Import Provenance
|
||||
|
||||
```sql
|
||||
CREATE TABLE epss_import_runs (
|
||||
import_run_id UUID PRIMARY KEY,
|
||||
model_date DATE NOT NULL,
|
||||
source_uri TEXT NOT NULL,
|
||||
retrieved_at TIMESTAMPTZ NOT NULL,
|
||||
file_sha256 TEXT NOT NULL,
|
||||
decompressed_sha256 TEXT NULL,
|
||||
row_count INT NOT NULL,
|
||||
model_version_tag TEXT NULL,
|
||||
published_date DATE NULL,
|
||||
status TEXT NOT NULL, -- SUCCEEDED / FAILED
|
||||
error TEXT NULL,
|
||||
UNIQUE (model_date)
|
||||
);
|
||||
```
|
||||
|
||||
#### Time-Series Scores (Partitioned)
|
||||
|
||||
```sql
|
||||
CREATE TABLE epss_scores (
|
||||
model_date DATE NOT NULL,
|
||||
cve_id TEXT NOT NULL,
|
||||
epss_score DOUBLE PRECISION NOT NULL,
|
||||
percentile DOUBLE PRECISION NOT NULL,
|
||||
import_run_id UUID NOT NULL REFERENCES epss_import_runs(import_run_id),
|
||||
PRIMARY KEY (model_date, cve_id)
|
||||
) PARTITION BY RANGE (model_date);
|
||||
```
|
||||
|
||||
#### Current Projection (Fast Lookup)
|
||||
|
||||
```sql
|
||||
CREATE TABLE epss_current (
|
||||
cve_id TEXT PRIMARY KEY,
|
||||
epss_score DOUBLE PRECISION NOT NULL,
|
||||
percentile DOUBLE PRECISION NOT NULL,
|
||||
model_date DATE NOT NULL,
|
||||
import_run_id UUID NOT NULL
|
||||
);
|
||||
|
||||
CREATE INDEX idx_epss_current_score_desc ON epss_current (epss_score DESC);
|
||||
CREATE INDEX idx_epss_current_percentile_desc ON epss_current (percentile DESC);
|
||||
```
|
||||
|
||||
#### Change Detection
|
||||
|
||||
```sql
|
||||
CREATE TABLE epss_changes (
|
||||
model_date DATE NOT NULL,
|
||||
cve_id TEXT NOT NULL,
|
||||
old_score DOUBLE PRECISION NULL,
|
||||
new_score DOUBLE PRECISION NOT NULL,
|
||||
delta_score DOUBLE PRECISION NULL,
|
||||
old_percentile DOUBLE PRECISION NULL,
|
||||
new_percentile DOUBLE PRECISION NOT NULL,
|
||||
flags INT NOT NULL, -- bitmask: NEW_SCORED, CROSSED_HIGH, BIG_JUMP
|
||||
PRIMARY KEY (model_date, cve_id)
|
||||
) PARTITION BY RANGE (model_date);
|
||||
```
|
||||
|
||||
### 2.2 Flags Bitmask
|
||||
|
||||
| Flag | Value | Meaning |
|
||||
|------|-------|---------|
|
||||
| NEW_SCORED | 0x01 | CVE newly scored (not in previous day) |
|
||||
| CROSSED_HIGH | 0x02 | Score crossed above high threshold |
|
||||
| CROSSED_LOW | 0x04 | Score crossed below high threshold |
|
||||
| BIG_JUMP_UP | 0x08 | Delta > 0.10 upward |
|
||||
| BIG_JUMP_DOWN | 0x10 | Delta > 0.10 downward |
|
||||
| TOP_PERCENTILE | 0x20 | Entered top 5% |
|
||||
|
||||
---
|
||||
|
||||
## 3. Service Architecture
|
||||
|
||||
### 3.1 Component Responsibilities
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ EPSS DATA FLOW │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Scheduler │────►│ Concelier │────►│ Scanner │ │
|
||||
│ │ (triggers) │ │ (ingest) │ │ (evidence) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ ┌──────────────┐ │ │
|
||||
│ │ │ Postgres │◄───────────┘ │
|
||||
│ │ │ (truth) │ │
|
||||
│ │ └──────────────┘ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Notify │◄────│ Excititor │ │
|
||||
│ │ (alerts) │ │ (VEX tasks) │ │
|
||||
│ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
| Component | Responsibility |
|
||||
|-----------|----------------|
|
||||
| **Scheduler** | Triggers daily EPSS import job |
|
||||
| **Concelier** | Downloads/imports EPSS, stores facts, computes delta, emits events |
|
||||
| **Scanner** | Attaches EPSS-at-scan as immutable evidence, uses for scoring |
|
||||
| **Excititor** | Creates VEX tasks when EPSS is high and VEX missing |
|
||||
| **Notify** | Sends alerts on priority changes |
|
||||
|
||||
### 3.2 Event Flow
|
||||
|
||||
```
|
||||
Scheduler
|
||||
→ epss.ingest(date)
|
||||
→ Concelier (ingest)
|
||||
→ epss.updated
|
||||
→ Notify (optional daily summary)
|
||||
→ Concelier (enrichment)
|
||||
→ vuln.priority.changed
|
||||
→ Notify (targeted alerts)
|
||||
→ Excititor (VEX task creation)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Ingestion Pipeline
|
||||
|
||||
### 4.1 Data Source
|
||||
|
||||
FIRST publishes daily CSV snapshots at:
|
||||
```
|
||||
https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz
|
||||
```
|
||||
|
||||
Each file contains ~300k CVE records with:
|
||||
- `cve` - CVE ID
|
||||
- `epss` - Score (0.00000–1.00000)
|
||||
- `percentile` - Rank vs all CVEs
|
||||
|
||||
### 4.2 Ingestion Steps
|
||||
|
||||
1. **Scheduler** triggers daily job for date D
|
||||
2. **Download** `epss_scores-D.csv.gz`
|
||||
3. **Decompress** stream
|
||||
4. **Parse** header comment for model version/date
|
||||
5. **Validate** scores in [0,1], monotonic percentile
|
||||
6. **Bulk load** into TEMP staging table
|
||||
7. **Transaction**:
|
||||
- Insert `epss_import_runs`
|
||||
- Insert into `epss_scores` partition
|
||||
- Compute `epss_changes` by comparing staging vs `epss_current`
|
||||
- Upsert `epss_current`
|
||||
- Enqueue `epss.updated` event
|
||||
8. **Commit**
|
||||
|
||||
### 4.3 Air-Gap Import
|
||||
|
||||
Accept local bundle containing:
|
||||
- `epss_scores-YYYY-MM-DD.csv.gz`
|
||||
- `manifest.json` with sha256, source attribution, DSSE signature
|
||||
|
||||
Same pipeline, with `source_uri = bundle://...`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Enrichment Rules
|
||||
|
||||
### 5.1 New Scan Findings (Immutable)
|
||||
|
||||
Store EPSS "as-of" scan time:
|
||||
```csharp
|
||||
public record ScanEpssEvidence
|
||||
{
|
||||
public double EpssScoreAtScan { get; init; }
|
||||
public double EpssPercentileAtScan { get; init; }
|
||||
public DateOnly EpssModelDateAtScan { get; init; }
|
||||
public Guid EpssImportRunIdAtScan { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
This supports deterministic replay even if EPSS changes later.
|
||||
|
||||
### 5.2 Existing Findings (Live Triage)
|
||||
|
||||
Maintain mutable "current EPSS" on vulnerability instances:
|
||||
- **scan_finding_evidence**: Immutable EPSS-at-scan
|
||||
- **vuln_instance_triage**: Current EPSS + band (for live triage)
|
||||
|
||||
### 5.3 Efficient Delta Targeting
|
||||
|
||||
On `epss.updated(D)`:
|
||||
1. Read `epss_changes` where flags indicate material change
|
||||
2. Find impacted vulnerability instances by CVE
|
||||
3. Update only those instances
|
||||
4. Emit `vuln.priority.changed` only if band crossed
|
||||
|
||||
---
|
||||
|
||||
## 6. Notification Policy
|
||||
|
||||
### 6.1 Default Thresholds
|
||||
|
||||
| Threshold | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| HighPercentile | 0.95 | Top 5% of all CVEs |
|
||||
| HighScore | 0.50 | 50% exploitation probability |
|
||||
| BigJumpDelta | 0.10 | Meaningful daily change |
|
||||
|
||||
### 6.2 Trigger Conditions
|
||||
|
||||
1. **Newly scored** CVE in inventory AND `percentile >= HighPercentile`
|
||||
2. Existing CVE **crosses above** HighPercentile or HighScore
|
||||
3. Delta > BigJumpDelta AND CVE in runtime-exposed assets
|
||||
|
||||
All thresholds are org-configurable.
|
||||
|
||||
---
|
||||
|
||||
## 7. Trust Lattice Integration
|
||||
|
||||
### 7.1 Scoring Rule Example
|
||||
|
||||
```
|
||||
IF cvss_base >= 8.0
|
||||
AND epss_score >= 0.35
|
||||
AND runtime_exposed = true
|
||||
→ priority = IMMEDIATE_ATTENTION
|
||||
```
|
||||
|
||||
### 7.2 Score Weights
|
||||
|
||||
| Factor | Default Weight | Range |
|
||||
|--------|---------------|-------|
|
||||
| CVSS | 0.25 | 0.0-1.0 |
|
||||
| EPSS | 0.25 | 0.0-1.0 |
|
||||
| Reachability | 0.25 | 0.0-1.0 |
|
||||
| Freshness | 0.15 | 0.0-1.0 |
|
||||
| Frequency | 0.10 | 0.0-1.0 |
|
||||
|
||||
---
|
||||
|
||||
## 8. API Surface
|
||||
|
||||
### 8.1 Internal API Endpoints
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `GET /epss/current?cve=...` | Bulk lookup current EPSS |
|
||||
| `GET /epss/history?cve=...&days=180` | Historical time-series |
|
||||
| `GET /epss/top?order=epss&limit=100` | Top CVEs by score |
|
||||
| `GET /epss/changes?date=...` | Daily change report |
|
||||
|
||||
### 8.2 UI Requirements
|
||||
|
||||
For each vulnerability instance:
|
||||
- EPSS score + percentile
|
||||
- Model date
|
||||
- Trend delta vs previous scan date
|
||||
- Filter chips: "High EPSS", "Rising EPSS", "High CVSS + High EPSS"
|
||||
- Evidence panel showing EPSS-at-scan vs current EPSS
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Checklist
|
||||
|
||||
### Phase 1: Data Foundation
|
||||
- [ ] DB migrations: tables + partitions + indexes
|
||||
- [ ] Concelier ingestion job: online download + bundle import
|
||||
|
||||
### Phase 2: Integration
|
||||
- [ ] epss_current + epss_changes projection
|
||||
- [ ] Scanner.WebService: attach EPSS-at-scan evidence
|
||||
- [ ] Bulk lookup API
|
||||
|
||||
### Phase 3: Enrichment
|
||||
- [ ] Concelier enrichment job: update triage projections
|
||||
- [ ] Notify subscription to vuln.priority.changed
|
||||
|
||||
### Phase 4: UI/UX
|
||||
- [ ] EPSS fields in vulnerability detail
|
||||
- [ ] Filters and sort by exploit likelihood
|
||||
- [ ] Trend visualization
|
||||
|
||||
### Phase 5: Operations
|
||||
- [ ] Backfill tool (last 180 days)
|
||||
- [ ] Ops runbook: schedules, manual re-run, air-gap import
|
||||
|
||||
---
|
||||
|
||||
## 10. Anti-Patterns to Avoid
|
||||
|
||||
| Anti-Pattern | Why It's Wrong |
|
||||
|--------------|----------------|
|
||||
| Storing only latest EPSS | Breaks auditability and replay |
|
||||
| Mixing EPSS into CVE table | EPSS is signal, not vulnerability data |
|
||||
| Treating EPSS as severity | EPSS is probability, not impact |
|
||||
| Alerting on every daily fluctuation | Creates alert fatigue |
|
||||
| Recomputing EPSS internally | Use FIRST's authoritative data |
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [Unknowns API Documentation](../api/unknowns-api.md)
|
||||
- [Score Replay API](../api/score-replay-api.md)
|
||||
- [Trust Lattice Architecture](../modules/scanner/architecture.md)
|
||||
Reference in New Issue
Block a user