Files
git.stella-ops.org/docs/guides/epss-integration-v4.md
master 8bbfe4d2d2 feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00

798 lines
23 KiB
Markdown

# EPSS v4 Integration Guide
## Overview
EPSS (Exploit Prediction Scoring System) v4 is a machine learning-based vulnerability scoring system developed by FIRST.org that predicts the probability a CVE will be exploited in the wild within the next 30 days. StellaOps integrates EPSS as a **probabilistic threat signal** alongside CVSS v4's **deterministic severity assessment**, enabling more accurate vulnerability prioritization.
**Key Concepts**:
- **EPSS Score**: Probability (0.0-1.0) that a CVE will be exploited in next 30 days
- **EPSS Percentile**: Ranking (0.0-1.0) of this CVE relative to all scored CVEs
- **Model Date**: Date for which EPSS scores were computed
- **Immutable at-scan**: EPSS evidence captured at scan time never changes (deterministic replay)
- **Current EPSS**: Live projection for triage (updated daily)
---
## How EPSS Works
EPSS uses machine learning to predict exploitation probability based on:
1. **Vulnerability Characteristics**: CVSS metrics, CWE, affected products
2. **Social Signals**: Twitter/GitHub mentions, security blog posts
3. **Exploit Database Entries**: Exploit-DB, Metasploit, etc.
4. **Historical Exploitation**: Past exploitation patterns
EPSS is updated **daily** by FIRST.org based on fresh threat intelligence.
### EPSS vs CVSS
| Dimension | CVSS v4 | EPSS v4 |
|-----------|---------|---------|
| **Nature** | Deterministic severity | Probabilistic threat |
| **Scale** | 0.0-10.0 (severity) | 0.0-1.0 (probability) |
| **Update Frequency** | Static (per CVE version) | Daily (live threat data) |
| **Purpose** | Impact assessment | Likelihood assessment |
| **Source** | Vendor/NVD | FIRST.org ML model |
**Example**:
- **CVE-2024-1234**: CVSS 9.8 (Critical) + EPSS 0.01 (1st percentile)
- Interpretation: Severe impact if exploited, but very unlikely to be exploited
- Priority: **Medium** (deprioritize despite high CVSS)
- **CVE-2024-5678**: CVSS 6.5 (Medium) + EPSS 0.95 (98th percentile)
- Interpretation: Moderate impact, but actively being exploited
- Priority: **High** (escalate despite moderate CVSS)
---
## Architecture Overview
### Data Flow
```
┌────────────────────────────────────────────────────────────────┐
│ EPSS Data Lifecycle in StellaOps │
└────────────────────────────────────────────────────────────────┘
1. INGESTION (Daily 00:05 UTC)
┌───────────────────┐
│ FIRST.org │ Daily CSV: epss_scores-YYYY-MM-DD.csv.gz
│ (300k CVEs) │ ~15MB compressed
└────────┬──────────┘
┌───────────────────────────────────────────────────────────┐
│ Concelier: EpssIngestJob │
│ - Download/Import CSV │
│ - Parse (handle # comment, validate bounds) │
│ - Bulk insert: epss_scores (partitioned by month) │
│ - Compute delta: epss_changes (flags for enrichment) │
│ - Upsert: epss_current (latest projection) │
│ - Emit event: "epss.updated" │
└────────┬──────────────────────────────────────────────────┘
[PostgreSQL: concelier.epss_*]
├─────────────────────────────┐
│ │
▼ ▼
2. AT-SCAN CAPTURE (Immutable Evidence)
┌────────────────────────────────────────────────────────────┐
│ Scanner: On new scan │
│ - Bulk query: epss_current for CVE list │
│ - Store immutable evidence: │
│ * epss_score_at_scan │
│ * epss_percentile_at_scan │
│ * epss_model_date_at_scan │
│ * epss_import_run_id_at_scan │
│ - Use in lattice decision (SR→CR if EPSS≥90th) │
└─────────────────────────────────────────────────────────────┘
3. LIVE ENRICHMENT (Existing Findings)
┌─────────────────────────────────────────────────────────────┐
│ Concelier: EpssEnrichmentJob (on "epss.updated") │
│ - Read: epss_changes WHERE flags IN (CROSSED_HIGH, BIG_JUMP)│
│ - Find impacted: vuln_instance_triage BY cve_id │
│ - Update: current_epss_score, current_epss_percentile │
│ - If priority band changed → emit "vuln.priority.changed" │
└────────┬────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Notify: On "vuln.priority.changed" │
│ - Check tenant notification rules │
│ - Send: Slack / Email / Teams / In-app │
│ - Payload: EPSS delta, threshold crossed │
└─────────────────────────────────────────────────────────────┘
4. POLICY SCORING
┌─────────────────────────────────────────────────────────────┐
│ Policy Engine: Risk Score Formula │
│ risk_score = (cvss/10) + epss_bonus + kev_bonus + reach_mult│
│ │
│ EPSS Bonus (Simple Profile): │
│ - Percentile ≥99th: +10% │
│ - Percentile ≥90th: +5% │
│ - Percentile ≥50th: +2% │
│ - Percentile <50th: 0% │
│ │
│ VEX Lattice Rules: │
│ - SR + EPSS≥90th → Escalate to CR (Confirmed Reachable) │
│ - DV + EPSS≥95th → Flag for review (vendor denial) │
│ - U + EPSS≥95th → Prioritize for reachability analysis │
└─────────────────────────────────────────────────────────────┘
```
### Database Schema
**Location**: `concelier` database
#### epss_import_runs (Provenance)
Tracks each EPSS import with full provenance for audit trail.
```sql
CREATE TABLE concelier.epss_import_runs (
import_run_id UUID PRIMARY KEY,
model_date DATE NOT NULL UNIQUE,
source_uri TEXT NOT NULL,
file_sha256 TEXT NOT NULL,
row_count INT NOT NULL,
model_version_tag TEXT NULL,
published_date DATE NULL,
status TEXT NOT NULL, -- IN_PROGRESS, SUCCEEDED, FAILED
created_at TIMESTAMPTZ NOT NULL
);
```
#### epss_scores (Time-Series, Partitioned)
Immutable append-only history of daily EPSS scores.
```sql
CREATE TABLE concelier.epss_scores (
model_date DATE NOT NULL,
cve_id TEXT NOT NULL,
epss_score DOUBLE PRECISION NOT NULL,
percentile DOUBLE PRECISION NOT NULL,
import_run_id UUID NOT NULL,
PRIMARY KEY (model_date, cve_id)
) PARTITION BY RANGE (model_date);
```
**Partitions**: Monthly (e.g., `epss_scores_2025_12`)
#### epss_current (Latest Projection)
Materialized view of latest EPSS score per CVE for fast lookups.
```sql
CREATE TABLE concelier.epss_current (
cve_id TEXT PRIMARY KEY,
epss_score DOUBLE PRECISION NOT NULL,
percentile DOUBLE PRECISION NOT NULL,
model_date DATE NOT NULL,
import_run_id UUID NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
);
```
**Usage**: Scanner bulk queries this table for new scans.
#### epss_changes (Delta Tracking, Partitioned)
Tracks material EPSS changes for targeted enrichment.
```sql
CREATE TABLE concelier.epss_changes (
model_date DATE NOT NULL,
cve_id TEXT NOT NULL,
old_score DOUBLE PRECISION NULL,
new_score DOUBLE PRECISION NOT NULL,
delta_score DOUBLE PRECISION NULL,
old_percentile DOUBLE PRECISION NULL,
new_percentile DOUBLE PRECISION NOT NULL,
delta_percentile DOUBLE PRECISION NULL,
flags INT NOT NULL, -- Bitmask
PRIMARY KEY (model_date, cve_id)
) PARTITION BY RANGE (model_date);
```
**Flags** (bitmask):
- `1` = NEW_SCORED (CVE newly appeared)
- `2` = CROSSED_HIGH (percentile ≥95th)
- `4` = BIG_JUMP (|Δscore| ≥0.10)
- `8` = DROPPED_LOW (percentile <50th)
- `16` = SCORE_INCREASED
- `32` = SCORE_DECREASED
---
## Configuration
### Scheduler Configuration
**File**: `etc/scheduler.yaml`
```yaml
scheduler:
jobs:
- name: epss.ingest
schedule: "0 5 0 * * *" # Daily at 00:05 UTC
worker: concelier
args:
source: online
date: null # Auto: yesterday
timeout: 600s
retry:
max_attempts: 3
backoff: exponential
```
### Concelier Configuration
**File**: `etc/concelier.yaml`
```yaml
concelier:
epss:
enabled: true
online_source:
base_url: "https://epss.empiricalsecurity.com/"
url_pattern: "epss_scores-{date:yyyy-MM-dd}.csv.gz"
timeout: 180s
bundle_source:
path: "/opt/stellaops/bundles/epss/"
thresholds:
high_percentile: 0.95 # Top 5%
high_score: 0.50 # 50% probability
big_jump_delta: 0.10 # 10 percentage points
low_percentile: 0.50 # Median
enrichment:
enabled: true
batch_size: 1000
flags_to_process:
- NEW_SCORED
- CROSSED_HIGH
- BIG_JUMP
```
### Scanner Configuration
**File**: `etc/scanner.yaml`
```yaml
scanner:
epss:
enabled: true
provider: postgres
cache_ttl: 3600
fallback_on_missing: unknown # Options: unknown, zero, skip
```
### Policy Configuration
**File**: `etc/policy.yaml`
```yaml
policy:
scoring:
epss:
enabled: true
profile: simple # Options: simple, advanced, custom
simple_bonuses:
percentile_99: 0.10 # +10%
percentile_90: 0.05 # +5%
percentile_50: 0.02 # +2%
lattice:
epss_escalation:
enabled: true
sr_to_cr_threshold: 0.90 # SR→CR if EPSS≥90th percentile
```
---
## Daily Operation
### Automated Ingestion
EPSS data is ingested automatically daily at **00:05 UTC** via Scheduler.
**Workflow**:
1. Scheduler triggers `epss.ingest` job at 00:05 UTC
2. Concelier downloads `epss_scores-YYYY-MM-DD.csv.gz` from FIRST.org
3. CSV parsed (comment line metadata, rows scores)
4. Bulk insert into `epss_scores` partition (NpgsqlBinaryImporter)
5. Compute delta: `epss_changes` (compare vs `epss_current`)
6. Upsert `epss_current` (latest projection)
7. Emit `epss.updated` event
8. Enrichment job updates impacted vulnerability instances
9. Notifications sent if priority bands changed
**Monitoring**:
```bash
# Check latest model date
stellaops epss status
# Output:
# EPSS Status:
# Latest Model Date: 2025-12-16
# Import Time: 2025-12-17 00:07:32 UTC
# CVE Count: 231,417
# Staleness: FRESH (1 day)
```
### Manual Triggering
```bash
# Trigger manual ingest (force re-import)
stellaops concelier job trigger epss.ingest --date 2025-12-16 --force
# Backfill historical data (last 30 days)
stellaops epss backfill --from 2025-11-17 --to 2025-12-16
```
---
## Air-Gapped Operation
### Bundle Structure
EPSS data for offline deployments is packaged in risk bundles:
```
risk-bundle-2025-12-16/
├── manifest.json
├── epss/
│ ├── epss_scores-2025-12-16.csv.zst # ZSTD compressed
│ └── epss_metadata.json
├── kev/
│ └── kev-catalog.json
└── signatures/
└── bundle.dsse.json
```
### EPSS Metadata
**File**: `epss/epss_metadata.json`
```json
{
"model_date": "2025-12-16",
"model_version": "v2025.12.16",
"published_date": "2025-12-16",
"row_count": 231417,
"sha256": "abc123...",
"source_uri": "https://epss.empiricalsecurity.com/epss_scores-2025-12-16.csv.gz",
"created_at": "2025-12-16T00:00:00Z"
}
```
### Import Procedure
```bash
# 1. Transfer bundle to air-gapped system
scp risk-bundle-2025-12-16.tar.zst airgap-host:/opt/stellaops/bundles/
# 2. Import bundle
stellaops offline import --bundle /opt/stellaops/bundles/risk-bundle-2025-12-16.tar.zst
# 3. Verify import
stellaops epss status
# Output:
# EPSS Status:
# Latest Model Date: 2025-12-16
# Source: bundle://risk-bundle-2025-12-16
# CVE Count: 231,417
# Staleness: ACCEPTABLE (within 7 days)
```
### Update Cadence
**Recommended**:
- **Online**: Daily (automatic)
- **Air-gapped**: Weekly (manual bundle import)
**Staleness Thresholds**:
- **FRESH**: 1 day
- **ACCEPTABLE**: 7 days
- **STALE**: 14 days
- **VERY_STALE**: >14 days (alert, fallback to CVSS-only)
---
## Scanner Integration
### EPSS Evidence in Scan Findings
Every scan finding includes **immutable EPSS-at-scan** evidence:
```json
{
"finding_id": "CVE-2024-12345-pkg:npm/lodash@4.17.21",
"cve_id": "CVE-2024-12345",
"product": "pkg:npm/lodash@4.17.21",
"scan_id": "scan-abc123",
"scan_timestamp": "2025-12-17T10:30:00Z",
"evidence": {
"cvss_v4": {
"vector_string": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H",
"base_score": 9.3,
"severity": "CRITICAL"
},
"epss_at_scan": {
"epss_score": 0.42357,
"percentile": 0.88234,
"model_date": "2025-12-16",
"import_run_id": "550e8400-e29b-41d4-a716-446655440000"
},
"epss_current": {
"epss_score": 0.45123,
"percentile": 0.89456,
"model_date": "2025-12-17",
"delta_score": 0.02766,
"delta_percentile": 0.01222,
"trend": "RISING"
}
}
}
```
**Key Points**:
- **epss_at_scan**: Immutable, captured at scan time (deterministic replay)
- **epss_current**: Mutable, updated daily for live triage
- **Replay**: Historical scans always use `epss_at_scan` for consistent policy evaluation
### Bulk Query Optimization
Scanner queries EPSS for all CVEs in a single database call:
```sql
SELECT cve_id, epss_score, percentile, model_date, import_run_id
FROM concelier.epss_current
WHERE cve_id = ANY(@cve_ids);
```
**Performance**: <500ms for 10k CVEs (P95)
---
## Policy Engine Integration
### Risk Score Formula
**Simple Profile**:
```
risk_score = (cvss_base / 10) + epss_bonus + kev_bonus
```
**EPSS Bonus Table**:
| EPSS Percentile | Bonus | Rationale |
|----------------|-------|-----------|
| 99th | +10% | Top 1% most likely to be exploited |
| 90th | +5% | Top 10% high exploitation probability |
| 50th | +2% | Above median moderate risk |
| <50th | 0% | Below median no bonus |
**Advanced Profile**:
Adds:
- **KEV synergy**: If in KEV catalog multiply EPSS bonus by 1.5
- **Uncertainty penalty**: Missing EPSS -5%
- **Temporal decay**: EPSS >30 days stale → reduce bonus by 50%
### VEX Lattice Rules
**Escalation**:
- **SR (Static Reachable) + EPSS≥90th** → Auto-escalate to **CR (Confirmed Reachable)**
- Rationale: High exploit probability warrants confirmation
**Review Flags**:
- **DV (Denied by Vendor VEX) + EPSS≥95th** → Flag for manual review
- Rationale: Vendor denial contradicted by active exploitation signals
**Prioritization**:
- **U (Unknown) + EPSS≥95th** → Prioritize for reachability analysis
- Rationale: High exploit probability justifies effort
### SPL (Stella Policy Language) Syntax
```yaml
# Custom policy using EPSS
rules:
- name: high_epss_escalation
condition: |
epss.percentile >= 0.95 AND
lattice.state == "SR" AND
runtime.exposed == true
action: escalate_to_cr
reason: "High EPSS (top 5%) + Static Reachable + Runtime Exposed"
- name: epss_trend_alert
condition: |
epss.delta_score >= 0.10 AND
cvss.base_score >= 7.0
action: notify
channels: [slack, email]
reason: "EPSS jumped by 10+ points (was {epss.old_score}, now {epss.new_score})"
```
**Available Fields**:
- `epss.score` - Current EPSS score (0.0-1.0)
- `epss.percentile` - Current percentile (0.0-1.0)
- `epss.model_date` - Model date
- `epss.delta_score` - Change vs previous scan
- `epss.trend` - RISING, FALLING, STABLE
- `epss.at_scan.score` - Immutable score at scan time
- `epss.at_scan.percentile` - Immutable percentile at scan time
---
## Notification Integration
### Event: vuln.priority.changed
Emitted when EPSS change causes priority band shift.
**Payload**:
```json
{
"event_type": "vuln.priority.changed",
"vulnerability_id": "CVE-2024-12345",
"product_key": "pkg:npm/lodash@4.17.21",
"old_priority_band": "medium",
"new_priority_band": "high",
"reason": "EPSS percentile crossed 95th (was 88th, now 96th)",
"epss_change": {
"old_score": 0.42,
"new_score": 0.78,
"delta_score": 0.36,
"old_percentile": 0.88,
"new_percentile": 0.96,
"model_date": "2025-12-16"
}
}
```
### Notification Rules
**File**: `etc/notify.yaml`
```yaml
notify:
rules:
- name: epss_crossed_high
event_type: vuln.priority.changed
condition: "payload.epss_change.new_percentile >= 0.95"
channels: [slack, email]
template: epss_high_alert
digest: false # Immediate
- name: epss_big_jump
event_type: vuln.priority.changed
condition: "payload.epss_change.delta_score >= 0.10"
channels: [slack]
template: epss_rising_threat
digest: true
digest_time: "09:00" # Daily digest at 9 AM
```
### Slack Template Example
```
🚨 **High EPSS Alert**
**CVE**: CVE-2024-12345
**Product**: pkg:npm/lodash@4.17.21
**EPSS**: 0.78 (96th percentile) ⬆️ from 0.42 (88th percentile)
**Delta**: +0.36 (36 percentage points)
**Priority**: Medium → **High**
**Action Required**: Review and prioritize remediation.
[View in StellaOps →](https://stellaops.example.com/vulns/CVE-2024-12345)
```
---
## Troubleshooting
### EPSS Data Not Available
**Symptom**: Scans show "EPSS: N/A"
**Diagnosis**:
```bash
# Check EPSS status
stellaops epss status
# Check import runs
stellaops concelier jobs list --type epss.ingest --limit 10
```
**Resolution**:
1. **No imports**: Trigger manual ingest
```bash
stellaops concelier job trigger epss.ingest
```
2. **Import failed**: Check logs
```bash
stellaops concelier logs --job-id <id> --level ERROR
```
3. **FIRST.org down**: Use air-gapped bundle
```bash
stellaops offline import --bundle /path/to/risk-bundle.tar.zst
```
### Stale EPSS Data
**Symptom**: UI shows "EPSS stale (14 days)"
**Diagnosis**:
```sql
SELECT * FROM concelier.epss_model_staleness;
-- Output: days_stale: 14, staleness_status: STALE
```
**Resolution**:
1. **Online**: Check scheduler job status
```bash
stellaops scheduler jobs status epss.ingest
```
2. **Air-gapped**: Import fresh bundle
```bash
stellaops offline import --bundle /path/to/latest-bundle.tar.zst
```
3. **Fallback**: Disable EPSS temporarily (uses CVSS-only)
```yaml
# etc/scanner.yaml
scanner:
epss:
enabled: false
```
### High Memory Usage During Ingest
**Symptom**: Concelier worker OOM during EPSS ingest
**Diagnosis**:
```bash
# Check memory metrics
stellaops metrics query 'process_resident_memory_bytes{service="concelier"}'
```
**Resolution**:
1. **Increase worker memory limit**:
```yaml
# Kubernetes deployment
resources:
limits:
memory: 1Gi # Was 512Mi
```
2. **Verify streaming parser** (should not load full CSV into memory):
```bash
# Check logs for "EPSS CSV parsed: rows_yielded="
stellaops concelier logs --job-type epss.ingest | grep "CSV parsed"
```
---
## Best Practices
### 1. Combine Signals (Never Use EPSS Alone)
❌ **Don't**: `if epss > 0.95 then CRITICAL`
✅ **Do**: `if cvss >= 8.0 AND epss >= 0.95 AND runtime_exposed then CRITICAL`
### 2. Review High EPSS Manually
Manually review vulnerabilities with EPSS ≥95th percentile, especially if:
- CVSS is low (<7.0) but EPSS is high
- Vendor VEX denies exploitability but EPSS is high
### 3. Track Trends
Monitor EPSS changes over time:
- Rising EPSS → increasing threat
- Falling EPSS → threat subsiding
### 4. Update Regularly
- **Online**: Daily (automatic)
- **Air-gapped**: Weekly minimum, daily preferred
### 5. Verify During Audits
For compliance audits, use EPSS-at-scan (immutable) not current EPSS:
```sql
SELECT epss_score_at_scan, epss_model_date_at_scan
FROM scan_findings
WHERE scan_id = 'audit-scan-20251217';
```
---
## API Reference
### Query Current EPSS
```bash
# Single CVE
stellaops epss get CVE-2024-12345
# Output:
# CVE-2024-12345
# Score: 0.42357 (42.4% probability)
# Percentile: 88.2th
# Model Date: 2025-12-16
# Status: FRESH
```
### Batch Query
```bash
# From file
stellaops epss batch --file cves.txt --output epss-scores.json
# cves.txt:
# CVE-2024-1
# CVE-2024-2
# CVE-2024-3
```
### Query History
```bash
# Last 180 days
stellaops epss history CVE-2024-12345 --days 180 --format csv
# Output: epss-history-CVE-2024-12345.csv
# model_date,epss_score,percentile
# 2025-12-17,0.45123,0.89456
# 2025-12-16,0.42357,0.88234
# ...
```
### Top CVEs by EPSS
```bash
# Top 100
stellaops epss top --limit 100 --format table
# Output:
# Rank | CVE | Score | Percentile | CVSS
# -----|---------------|--------|------------|------
# 1 | CVE-2024-9999 | 0.9872 | 99.9th | 9.8
# 2 | CVE-2024-8888 | 0.9654 | 99.8th | 8.1
# ...
```
---
## References
- **FIRST EPSS Homepage**: https://www.first.org/epss/
- **EPSS Data & Stats**: https://www.first.org/epss/data_stats
- **EPSS API Docs**: https://www.first.org/epss/api
- **CVSS v4.0 Spec**: https://www.first.org/cvss/v4.0/specification-document
- **StellaOps Policy Guide**: `docs/policy/overview.md`
- **StellaOps Reachability Guide**: `docs/modules/scanner/reachability.md`
---
**Last Updated**: 2025-12-17
**Version**: 1.0
**Maintainer**: StellaOps Security Team