460 lines
16 KiB
Markdown
460 lines
16 KiB
Markdown
# EPSS Integration Architecture
|
||
|
||
> **Advisory Source**: `docs/product-advisories/16-Dec-2025 - Merging EPSS v4 with CVSS v4 Frameworks.md`
|
||
> **Last Updated**: 2025-12-17
|
||
> **Status**: Approved for Implementation
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
EPSS (Exploit Prediction Scoring System) is a **probabilistic model** that estimates the likelihood a given CVE will be exploited in the wild over the next ~30 days. This document defines how StellaOps integrates EPSS as a first-class risk signal.
|
||
|
||
**Key Distinction**:
|
||
- **CVSS v4**: Deterministic measurement of *severity* (0-10)
|
||
- **EPSS**: Dynamic, data-driven *probability of exploitation* (0-1)
|
||
|
||
EPSS does **not** replace CVSS or VEX—it provides complementary probabilistic threat intelligence.
|
||
|
||
---
|
||
|
||
## 1. Design Principles
|
||
|
||
### 1.1 EPSS as Probabilistic Signal
|
||
|
||
| Signal Type | Nature | Source |
|
||
|-------------|--------|--------|
|
||
| CVSS v4 | Deterministic impact | NVD, vendor |
|
||
| EPSS | Probabilistic threat | FIRST daily feeds |
|
||
| VEX | Vendor intent | Vendor statements |
|
||
| Runtime context | Actual exposure | StellaOps scanner |
|
||
|
||
**Rule**: EPSS *modulates confidence*, never asserts truth.
|
||
|
||
### 1.2 Architectural Constraints
|
||
|
||
1. **Append-only time-series**: Never overwrite historical EPSS data
|
||
2. **Deterministic replay**: Every scan stores the EPSS snapshot reference used
|
||
3. **Idempotent ingestion**: Safe to re-run for same date
|
||
4. **Postgres as source of truth**: Valkey is optional cache only
|
||
5. **Air-gap compatible**: Manual import via signed bundles
|
||
|
||
---
|
||
|
||
## 2. Data Model
|
||
|
||
### 2.1 Core Tables
|
||
|
||
#### Import Provenance
|
||
|
||
```sql
|
||
CREATE TABLE epss_import_runs (
|
||
import_run_id UUID PRIMARY KEY,
|
||
model_date DATE NOT NULL,
|
||
source_uri TEXT NOT NULL,
|
||
retrieved_at TIMESTAMPTZ NOT NULL,
|
||
file_sha256 TEXT NOT NULL,
|
||
decompressed_sha256 TEXT NULL,
|
||
row_count INT NOT NULL,
|
||
model_version_tag TEXT NULL,
|
||
published_date DATE NULL,
|
||
status TEXT NOT NULL, -- SUCCEEDED / FAILED
|
||
error TEXT NULL,
|
||
UNIQUE (model_date)
|
||
);
|
||
```
|
||
|
||
#### Time-Series Scores (Partitioned)
|
||
|
||
```sql
|
||
CREATE TABLE epss_scores (
|
||
model_date DATE NOT NULL,
|
||
cve_id TEXT NOT NULL,
|
||
epss_score DOUBLE PRECISION NOT NULL,
|
||
percentile DOUBLE PRECISION NOT NULL,
|
||
import_run_id UUID NOT NULL REFERENCES epss_import_runs(import_run_id),
|
||
PRIMARY KEY (model_date, cve_id)
|
||
) PARTITION BY RANGE (model_date);
|
||
```
|
||
|
||
#### Current Projection (Fast Lookup)
|
||
|
||
```sql
|
||
CREATE TABLE epss_current (
|
||
cve_id TEXT PRIMARY KEY,
|
||
epss_score DOUBLE PRECISION NOT NULL,
|
||
percentile DOUBLE PRECISION NOT NULL,
|
||
model_date DATE NOT NULL,
|
||
import_run_id UUID NOT NULL
|
||
);
|
||
|
||
CREATE INDEX idx_epss_current_score_desc ON epss_current (epss_score DESC);
|
||
CREATE INDEX idx_epss_current_percentile_desc ON epss_current (percentile DESC);
|
||
```
|
||
|
||
#### Change Detection
|
||
|
||
```sql
|
||
CREATE TABLE epss_changes (
|
||
model_date DATE NOT NULL,
|
||
cve_id TEXT NOT NULL,
|
||
old_score DOUBLE PRECISION NULL,
|
||
new_score DOUBLE PRECISION NOT NULL,
|
||
delta_score DOUBLE PRECISION NULL,
|
||
old_percentile DOUBLE PRECISION NULL,
|
||
new_percentile DOUBLE PRECISION NOT NULL,
|
||
flags INT NOT NULL, -- bitmask: NEW_SCORED, CROSSED_HIGH, BIG_JUMP
|
||
PRIMARY KEY (model_date, cve_id)
|
||
) PARTITION BY RANGE (model_date);
|
||
```
|
||
|
||
### 2.2 Flags Bitmask
|
||
|
||
| Flag | Value | Meaning |
|
||
|------|-------|---------|
|
||
| NEW_SCORED | 0x01 | CVE newly scored (not in previous day) |
|
||
| CROSSED_HIGH | 0x02 | Score crossed above high threshold |
|
||
| CROSSED_LOW | 0x04 | Score crossed below high threshold |
|
||
| BIG_JUMP_UP | 0x08 | Delta > 0.10 upward |
|
||
| BIG_JUMP_DOWN | 0x10 | Delta > 0.10 downward |
|
||
| TOP_PERCENTILE | 0x20 | Entered top 5% |
|
||
|
||
---
|
||
|
||
## 3. Service Architecture
|
||
|
||
### 3.1 Component Responsibilities
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ EPSS DATA FLOW │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Scheduler │────►│ Concelier │────►│ Scanner │ │
|
||
│ │ (triggers) │ │ (ingest) │ │ (evidence) │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
│ │ │ │ │
|
||
│ │ ▼ │ │
|
||
│ │ ┌──────────────┐ │ │
|
||
│ │ │ Postgres │◄───────────┘ │
|
||
│ │ │ (truth) │ │
|
||
│ │ └──────────────┘ │
|
||
│ │ │ │
|
||
│ ▼ ▼ │
|
||
│ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Notify │◄────│ Excititor │ │
|
||
│ │ (alerts) │ │ (VEX tasks) │ │
|
||
│ └──────────────┘ └──────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
| Component | Responsibility |
|
||
|-----------|----------------|
|
||
| **Scheduler** | Triggers daily EPSS import job |
|
||
| **Concelier** | Downloads/imports EPSS, stores facts, computes delta, emits events |
|
||
| **Scanner** | Attaches EPSS-at-scan as immutable evidence, uses for scoring |
|
||
| **Excititor** | Creates VEX tasks when EPSS is high and VEX missing |
|
||
| **Notify** | Sends alerts on priority changes |
|
||
|
||
### 3.2 Event Flow
|
||
|
||
```
|
||
Scheduler
|
||
→ epss.ingest(date)
|
||
→ Concelier (ingest)
|
||
→ epss.updated
|
||
→ Notify (optional daily summary)
|
||
→ Concelier (enrichment)
|
||
→ vuln.priority.changed
|
||
→ Notify (targeted alerts)
|
||
→ Excititor (VEX task creation)
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Ingestion Pipeline
|
||
|
||
### 4.1 Data Source
|
||
|
||
FIRST publishes daily CSV snapshots at:
|
||
```
|
||
https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz
|
||
```
|
||
|
||
Each file contains ~300k CVE records with:
|
||
- `cve` - CVE ID
|
||
- `epss` - Score (0.00000–1.00000)
|
||
- `percentile` - Rank vs all CVEs
|
||
|
||
### 4.2 Ingestion Steps
|
||
|
||
1. **Scheduler** triggers daily job for date D
|
||
2. **Download** `epss_scores-D.csv.gz`
|
||
3. **Decompress** stream
|
||
4. **Parse** header comment for model version/date
|
||
5. **Validate** scores in [0,1], monotonic percentile
|
||
6. **Bulk load** into TEMP staging table
|
||
7. **Transaction**:
|
||
- Insert `epss_import_runs`
|
||
- Insert into `epss_scores` partition
|
||
- Compute `epss_changes` by comparing staging vs `epss_current`
|
||
- Upsert `epss_current`
|
||
- Enqueue `epss.updated` event
|
||
8. **Commit**
|
||
|
||
### 4.3 Air-Gap Import
|
||
|
||
Accept local bundle containing:
|
||
- `epss_scores-YYYY-MM-DD.csv.gz`
|
||
- `manifest.json` with sha256, source attribution, DSSE signature
|
||
|
||
Same pipeline, with `source_uri = bundle://...`.
|
||
|
||
---
|
||
|
||
## 5. Enrichment Rules
|
||
|
||
### 5.1 New Scan Findings (Immutable)
|
||
|
||
Store EPSS "as-of" scan time:
|
||
```csharp
|
||
public record ScanEpssEvidence
|
||
{
|
||
public double EpssScoreAtScan { get; init; }
|
||
public double EpssPercentileAtScan { get; init; }
|
||
public DateOnly EpssModelDateAtScan { get; init; }
|
||
public Guid EpssImportRunIdAtScan { get; init; }
|
||
}
|
||
```
|
||
|
||
This supports deterministic replay even if EPSS changes later.
|
||
|
||
### 5.2 Existing Findings (Live Triage)
|
||
|
||
Maintain mutable "current EPSS" on vulnerability instances:
|
||
- **scan_finding_evidence**: Immutable EPSS-at-scan
|
||
- **vuln_instance_triage**: Current EPSS + band (for live triage)
|
||
|
||
### 5.3 Efficient Delta Targeting
|
||
|
||
On `epss.updated(D)`:
|
||
1. Read `epss_changes` where flags indicate material change
|
||
2. Find impacted vulnerability instances by CVE
|
||
3. Update only those instances
|
||
4. Emit `vuln.priority.changed` only if band crossed
|
||
|
||
---
|
||
|
||
## 6. Notification Policy
|
||
|
||
### 6.1 Default Thresholds
|
||
|
||
| Threshold | Default | Description |
|
||
|-----------|---------|-------------|
|
||
| HighPercentile | 0.95 | Top 5% of all CVEs |
|
||
| HighScore | 0.50 | 50% exploitation probability |
|
||
| BigJumpDelta | 0.10 | Meaningful daily change |
|
||
|
||
### 6.2 Trigger Conditions
|
||
|
||
1. **Newly scored** CVE in inventory AND `percentile >= HighPercentile`
|
||
2. Existing CVE **crosses above** HighPercentile or HighScore
|
||
3. Delta > BigJumpDelta AND CVE in runtime-exposed assets
|
||
|
||
All thresholds are org-configurable.
|
||
|
||
---
|
||
|
||
## 7. Trust Lattice Integration
|
||
|
||
### 7.1 Scoring Rule Example
|
||
|
||
```
|
||
IF cvss_base >= 8.0
|
||
AND epss_score >= 0.35
|
||
AND runtime_exposed = true
|
||
→ priority = IMMEDIATE_ATTENTION
|
||
```
|
||
|
||
### 7.2 Score Weights
|
||
|
||
| Factor | Default Weight | Range |
|
||
|--------|---------------|-------|
|
||
| CVSS | 0.25 | 0.0-1.0 |
|
||
| EPSS | 0.25 | 0.0-1.0 |
|
||
| Reachability | 0.25 | 0.0-1.0 |
|
||
| Freshness | 0.15 | 0.0-1.0 |
|
||
| Frequency | 0.10 | 0.0-1.0 |
|
||
|
||
---
|
||
|
||
## 8. API Surface
|
||
|
||
### 8.1 Internal API Endpoints
|
||
|
||
| Endpoint | Description |
|
||
|----------|-------------|
|
||
| `GET /epss/current?cve=...` | Bulk lookup current EPSS |
|
||
| `GET /epss/history?cve=...&days=180` | Historical time-series |
|
||
| `GET /epss/top?order=epss&limit=100` | Top CVEs by score |
|
||
| `GET /epss/changes?date=...` | Daily change report |
|
||
|
||
### 8.2 UI Requirements
|
||
|
||
For each vulnerability instance:
|
||
- EPSS score + percentile
|
||
- Model date
|
||
- Trend delta vs previous scan date
|
||
- Filter chips: "High EPSS", "Rising EPSS", "High CVSS + High EPSS"
|
||
- Evidence panel showing EPSS-at-scan vs current EPSS
|
||
|
||
---
|
||
|
||
## 9. Implementation Checklist
|
||
|
||
### Phase 1: Data Foundation
|
||
- [ ] DB migrations: tables + partitions + indexes
|
||
- [ ] Concelier ingestion job: online download + bundle import
|
||
|
||
### Phase 2: Integration
|
||
- [x] epss_current + epss_changes projection
|
||
- [x] Scanner.WebService: attach EPSS-at-scan evidence
|
||
- [x] Bulk lookup API (`/api/v1/epss/*`)
|
||
|
||
### Phase 3: Enrichment
|
||
- [x] Scanner Worker `EpssEnrichmentJob`: update `vuln_instance_triage` for CVEs with material changes
|
||
- [x] Scanner Worker `EpssSignalJob`: generate tenant-scoped EPSS signals (stored in `epss_signal`; published via `IEpssSignalPublisher` when configured)
|
||
|
||
### Phase 4: UI/UX
|
||
- [ ] EPSS fields in vulnerability detail
|
||
- [ ] Filters and sort by exploit likelihood
|
||
- [ ] Trend visualization
|
||
|
||
### Phase 5: Operations
|
||
- [x] Backfill tool (last 180 days)
|
||
- [x] Ops runbook: schedules, manual re-run, air-gap import
|
||
|
||
---
|
||
|
||
## 10. Operations Runbook
|
||
|
||
### 10.1 Configuration
|
||
|
||
EPSS jobs are configured via the `Epss:*` sections in Scanner Worker configuration:
|
||
|
||
```yaml
|
||
Epss:
|
||
Ingest:
|
||
Enabled: true # Enable/disable the job
|
||
Schedule: "0 5 0 * * *" # Cron expression (default: 00:05 UTC daily)
|
||
SourceType: "online" # "online" or "bundle"
|
||
BundlePath: null # Path for air-gapped bundle import
|
||
InitialDelay: "00:00:30" # Wait before first run (30s)
|
||
RetryDelay: "00:05:00" # Delay between retries (5m)
|
||
MaxRetries: 3 # Maximum retry attempts
|
||
Enrichment:
|
||
Enabled: true # Enable/disable live triage enrichment
|
||
PostIngestDelay: "00:01:00" # Wait after ingest before enriching
|
||
BatchSize: 1000 # CVEs per batch
|
||
HighPercentile: 0.99 # ≥ threshold => HIGH (and CrossedHigh flag)
|
||
HighScore: 0.50 # ≥ threshold => high score threshold
|
||
BigJumpDelta: 0.10 # ≥ threshold => BIG_JUMP flag
|
||
CriticalPercentile: 0.995 # ≥ threshold => CRITICAL
|
||
MediumPercentile: 0.90 # ≥ threshold => MEDIUM
|
||
FlagsToProcess: "NewScored,CrossedHigh,BigJumpUp,BigJumpDown" # Empty => process all
|
||
Signal:
|
||
Enabled: true # Enable/disable tenant-scoped signal generation
|
||
PostEnrichmentDelay: "00:00:30" # Wait after enrichment before emitting signals
|
||
BatchSize: 500 # Signals per batch
|
||
RetentionDays: 90 # Retention for epss_signal layer
|
||
SuppressSignalsOnModelChange: true # Suppress per-CVE signals on model version changes
|
||
```
|
||
|
||
### 10.2 Online Mode (Connected)
|
||
|
||
The job automatically fetches EPSS data from FIRST.org at the scheduled time:
|
||
|
||
1. Downloads `https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz`
|
||
2. Validates SHA256 hash
|
||
3. Parses CSV and bulk inserts to `epss_scores`
|
||
4. Computes delta against `epss_current`
|
||
5. Updates `epss_current` projection
|
||
6. Publishes `epss.updated` event
|
||
|
||
### 10.3 Air-Gap Mode (Bundle)
|
||
|
||
For offline deployments:
|
||
|
||
1. Download EPSS CSV from FIRST.org on an internet-connected system
|
||
2. Copy to the configured `BundlePath` location
|
||
3. Set `SourceType: "bundle"` in configuration
|
||
4. The job will read from the local file instead of fetching online
|
||
|
||
### 10.4 Manual Ingestion
|
||
|
||
There is currently no HTTP endpoint for one-shot ingestion. To force a run:
|
||
|
||
1. Temporarily set `Epss:Ingest:Schedule` to `0 * * * * *` and `Epss:Ingest:InitialDelay` to `00:00:00`
|
||
2. Restart Scanner Worker and wait for one ingest cycle
|
||
3. Restore the normal schedule
|
||
|
||
Note: a successful ingest triggers `EpssEnrichmentJob`, which then triggers `EpssSignalJob`.
|
||
|
||
### 10.5 Troubleshooting
|
||
|
||
| Symptom | Likely Cause | Resolution |
|
||
|---------|--------------|------------|
|
||
| Job not running | `Enabled: false` | Set `Enabled: true` |
|
||
| Download fails | Network/firewall | Check HTTPS egress to `epss.empiricalsecurity.com` |
|
||
| Parse errors | Corrupted file | Re-download, check SHA256 |
|
||
| Enrichment/signals not running | Storage disabled or job disabled | Ensure `ScannerStorage:Postgres:ConnectionString` is set and `Epss:Enrichment:Enabled` / `Epss:Signal:Enabled` are `true` |
|
||
| Slow ingestion | Large dataset / constrained IO | Expect <120s for ~310k rows; confirm via the perf harness and compare against CI baseline |
|
||
| Duplicate runs | Idempotent | Safe - existing data preserved |
|
||
|
||
### 10.6 Monitoring
|
||
|
||
Key metrics and traces:
|
||
|
||
- **Activities**
|
||
- `StellaOps.Scanner.EpssIngest` (`epss.ingest`): `epss.model_date`, `epss.row_count`, `epss.cve_count`, `epss.duration_ms`
|
||
- `StellaOps.Scanner.EpssEnrichment` (`epss.enrich`): `epss.model_date`, `epss.changed_cve_count`, `epss.updated_count`, `epss.band_change_count`, `epss.duration_ms`
|
||
- `StellaOps.Scanner.EpssSignal` (`epss.signal.generate`): `epss.model_date`, `epss.change_count`, `epss.signal_count`, `epss.filtered_count`, `epss.tenant_count`, `epss.duration_ms`
|
||
|
||
- **Metrics**
|
||
- `epss_enrichment_runs_total{result}` / `epss_enrichment_duration_ms` / `epss_enrichment_updated_total` / `epss_enrichment_band_changes_total`
|
||
- `epss_signal_runs_total{result}` / `epss_signal_duration_ms` / `epss_signals_emitted_total{event_type, tenant_id}`
|
||
|
||
- **Logs** (structured)
|
||
- `EPSS ingest/enrichment/signal job started`
|
||
- `EPSS ingestion completed: modelDate={ModelDate}, rows={RowCount}, ...`
|
||
- `EPSS enrichment completed: updated={Updated}, bandChanges={BandChanges}, ...`
|
||
- `EPSS model version changed: {OldVersion} -> {NewVersion}`
|
||
- `EPSS signal generation completed: signals={SignalCount}, changes={ChangeCount}, ...`
|
||
|
||
### 10.7 Performance
|
||
|
||
- Local harness: `src/Scanner/__Benchmarks/StellaOps.Scanner.Storage.Epss.Perf/README.md`
|
||
- CI workflow: `.gitea/workflows/epss-ingest-perf.yml` (nightly + manual, artifacts retained 90 days)
|
||
|
||
---
|
||
|
||
## 11. Anti-Patterns to Avoid
|
||
|
||
| Anti-Pattern | Why It's Wrong |
|
||
|--------------|----------------|
|
||
| Storing only latest EPSS | Breaks auditability and replay |
|
||
| Mixing EPSS into CVE table | EPSS is signal, not vulnerability data |
|
||
| Treating EPSS as severity | EPSS is probability, not impact |
|
||
| Alerting on every daily fluctuation | Creates alert fatigue |
|
||
| Recomputing EPSS internally | Use FIRST's authoritative data |
|
||
|
||
---
|
||
|
||
## Related Documents
|
||
|
||
- [Unknowns API Documentation](../api/unknowns-api.md)
|
||
- [Score Replay API](../api/score-replay-api.md)
|
||
- [Trust Lattice Architecture](../modules/scanner/architecture.md)
|