# EPSS Integration Architecture > **Advisory Source**: `docs/product-advisories/16-Dec-2025 - Merging EPSS v4 with CVSS v4 Frameworks.md` > **Last Updated**: 2025-12-17 > **Status**: Approved for Implementation --- ## Executive Summary EPSS (Exploit Prediction Scoring System) is a **probabilistic model** that estimates the likelihood a given CVE will be exploited in the wild over the next ~30 days. This document defines how StellaOps integrates EPSS as a first-class risk signal. **Key Distinction**: - **CVSS v4**: Deterministic measurement of *severity* (0-10) - **EPSS**: Dynamic, data-driven *probability of exploitation* (0-1) EPSS does **not** replace CVSS or VEX—it provides complementary probabilistic threat intelligence. --- ## 1. Design Principles ### 1.1 EPSS as Probabilistic Signal | Signal Type | Nature | Source | |-------------|--------|--------| | CVSS v4 | Deterministic impact | NVD, vendor | | EPSS | Probabilistic threat | FIRST daily feeds | | VEX | Vendor intent | Vendor statements | | Runtime context | Actual exposure | StellaOps scanner | **Rule**: EPSS *modulates confidence*, never asserts truth. ### 1.2 Architectural Constraints 1. **Append-only time-series**: Never overwrite historical EPSS data 2. **Deterministic replay**: Every scan stores the EPSS snapshot reference used 3. **Idempotent ingestion**: Safe to re-run for same date 4. **Postgres as source of truth**: Valkey is optional cache only 5. **Air-gap compatible**: Manual import via signed bundles --- ## 2. Data Model ### 2.1 Core Tables #### Import Provenance ```sql CREATE TABLE epss_import_runs ( import_run_id UUID PRIMARY KEY, model_date DATE NOT NULL, source_uri TEXT NOT NULL, retrieved_at TIMESTAMPTZ NOT NULL, file_sha256 TEXT NOT NULL, decompressed_sha256 TEXT NULL, row_count INT NOT NULL, model_version_tag TEXT NULL, published_date DATE NULL, status TEXT NOT NULL, -- SUCCEEDED / FAILED error TEXT NULL, UNIQUE (model_date) ); ``` #### Time-Series Scores (Partitioned) ```sql CREATE TABLE epss_scores ( model_date DATE NOT NULL, cve_id TEXT NOT NULL, epss_score DOUBLE PRECISION NOT NULL, percentile DOUBLE PRECISION NOT NULL, import_run_id UUID NOT NULL REFERENCES epss_import_runs(import_run_id), PRIMARY KEY (model_date, cve_id) ) PARTITION BY RANGE (model_date); ``` #### Current Projection (Fast Lookup) ```sql CREATE TABLE epss_current ( cve_id TEXT PRIMARY KEY, epss_score DOUBLE PRECISION NOT NULL, percentile DOUBLE PRECISION NOT NULL, model_date DATE NOT NULL, import_run_id UUID NOT NULL ); CREATE INDEX idx_epss_current_score_desc ON epss_current (epss_score DESC); CREATE INDEX idx_epss_current_percentile_desc ON epss_current (percentile DESC); ``` #### Change Detection ```sql CREATE TABLE epss_changes ( model_date DATE NOT NULL, cve_id TEXT NOT NULL, old_score DOUBLE PRECISION NULL, new_score DOUBLE PRECISION NOT NULL, delta_score DOUBLE PRECISION NULL, old_percentile DOUBLE PRECISION NULL, new_percentile DOUBLE PRECISION NOT NULL, flags INT NOT NULL, -- bitmask: NEW_SCORED, CROSSED_HIGH, BIG_JUMP PRIMARY KEY (model_date, cve_id) ) PARTITION BY RANGE (model_date); ``` ### 2.2 Flags Bitmask | Flag | Value | Meaning | |------|-------|---------| | NEW_SCORED | 0x01 | CVE newly scored (not in previous day) | | CROSSED_HIGH | 0x02 | Score crossed above high threshold | | CROSSED_LOW | 0x04 | Score crossed below high threshold | | BIG_JUMP_UP | 0x08 | Delta > 0.10 upward | | BIG_JUMP_DOWN | 0x10 | Delta > 0.10 downward | | TOP_PERCENTILE | 0x20 | Entered top 5% | --- ## 3. Service Architecture ### 3.1 Component Responsibilities ``` ┌─────────────────────────────────────────────────────────────────┐ │ EPSS DATA FLOW │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Scheduler │────►│ Concelier │────►│ Scanner │ │ │ │ (triggers) │ │ (ingest) │ │ (evidence) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ Postgres │◄───────────┘ │ │ │ │ (truth) │ │ │ │ └──────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Notify │◄────│ Excititor │ │ │ │ (alerts) │ │ (VEX tasks) │ │ │ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` | Component | Responsibility | |-----------|----------------| | **Scheduler** | Triggers daily EPSS import job | | **Concelier** | Downloads/imports EPSS, stores facts, computes delta, emits events | | **Scanner** | Attaches EPSS-at-scan as immutable evidence, uses for scoring | | **Excititor** | Creates VEX tasks when EPSS is high and VEX missing | | **Notify** | Sends alerts on priority changes | ### 3.2 Event Flow ``` Scheduler → epss.ingest(date) → Concelier (ingest) → epss.updated → Notify (optional daily summary) → Concelier (enrichment) → vuln.priority.changed → Notify (targeted alerts) → Excititor (VEX task creation) ``` --- ## 4. Ingestion Pipeline ### 4.1 Data Source FIRST publishes daily CSV snapshots at: ``` https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz ``` Each file contains ~300k CVE records with: - `cve` - CVE ID - `epss` - Score (0.00000–1.00000) - `percentile` - Rank vs all CVEs ### 4.2 Ingestion Steps 1. **Scheduler** triggers daily job for date D 2. **Download** `epss_scores-D.csv.gz` 3. **Decompress** stream 4. **Parse** header comment for model version/date 5. **Validate** scores in [0,1], monotonic percentile 6. **Bulk load** into TEMP staging table 7. **Transaction**: - Insert `epss_import_runs` - Insert into `epss_scores` partition - Compute `epss_changes` by comparing staging vs `epss_current` - Upsert `epss_current` - Enqueue `epss.updated` event 8. **Commit** ### 4.3 Air-Gap Import Accept local bundle containing: - `epss_scores-YYYY-MM-DD.csv.gz` - `manifest.json` with sha256, source attribution, DSSE signature Same pipeline, with `source_uri = bundle://...`. --- ## 5. Enrichment Rules ### 5.1 New Scan Findings (Immutable) Store EPSS "as-of" scan time: ```csharp public record ScanEpssEvidence { public double EpssScoreAtScan { get; init; } public double EpssPercentileAtScan { get; init; } public DateOnly EpssModelDateAtScan { get; init; } public Guid EpssImportRunIdAtScan { get; init; } } ``` This supports deterministic replay even if EPSS changes later. ### 5.2 Existing Findings (Live Triage) Maintain mutable "current EPSS" on vulnerability instances: - **scan_finding_evidence**: Immutable EPSS-at-scan - **vuln_instance_triage**: Current EPSS + band (for live triage) ### 5.3 Efficient Delta Targeting On `epss.updated(D)`: 1. Read `epss_changes` where flags indicate material change 2. Find impacted vulnerability instances by CVE 3. Update only those instances 4. Emit `vuln.priority.changed` only if band crossed --- ## 6. Notification Policy ### 6.1 Default Thresholds | Threshold | Default | Description | |-----------|---------|-------------| | HighPercentile | 0.95 | Top 5% of all CVEs | | HighScore | 0.50 | 50% exploitation probability | | BigJumpDelta | 0.10 | Meaningful daily change | ### 6.2 Trigger Conditions 1. **Newly scored** CVE in inventory AND `percentile >= HighPercentile` 2. Existing CVE **crosses above** HighPercentile or HighScore 3. Delta > BigJumpDelta AND CVE in runtime-exposed assets All thresholds are org-configurable. --- ## 7. Trust Lattice Integration ### 7.1 Scoring Rule Example ``` IF cvss_base >= 8.0 AND epss_score >= 0.35 AND runtime_exposed = true → priority = IMMEDIATE_ATTENTION ``` ### 7.2 Score Weights | Factor | Default Weight | Range | |--------|---------------|-------| | CVSS | 0.25 | 0.0-1.0 | | EPSS | 0.25 | 0.0-1.0 | | Reachability | 0.25 | 0.0-1.0 | | Freshness | 0.15 | 0.0-1.0 | | Frequency | 0.10 | 0.0-1.0 | --- ## 8. API Surface ### 8.1 Internal API Endpoints | Endpoint | Description | |----------|-------------| | `GET /epss/current?cve=...` | Bulk lookup current EPSS | | `GET /epss/history?cve=...&days=180` | Historical time-series | | `GET /epss/top?order=epss&limit=100` | Top CVEs by score | | `GET /epss/changes?date=...` | Daily change report | ### 8.2 UI Requirements For each vulnerability instance: - EPSS score + percentile - Model date - Trend delta vs previous scan date - Filter chips: "High EPSS", "Rising EPSS", "High CVSS + High EPSS" - Evidence panel showing EPSS-at-scan vs current EPSS --- ## 9. Implementation Checklist ### Phase 1: Data Foundation - [ ] DB migrations: tables + partitions + indexes - [ ] Concelier ingestion job: online download + bundle import ### Phase 2: Integration - [ ] epss_current + epss_changes projection - [ ] Scanner.WebService: attach EPSS-at-scan evidence - [ ] Bulk lookup API ### Phase 3: Enrichment - [ ] Concelier enrichment job: update triage projections - [ ] Notify subscription to vuln.priority.changed ### Phase 4: UI/UX - [ ] EPSS fields in vulnerability detail - [ ] Filters and sort by exploit likelihood - [ ] Trend visualization ### Phase 5: Operations - [x] Backfill tool (last 180 days) - [x] Ops runbook: schedules, manual re-run, air-gap import --- ## 10. Operations Runbook ### 10.1 Configuration EPSS ingestion is configured via the `Epss:Ingest` section in Scanner Worker configuration: ```yaml Epss: Ingest: Enabled: true # Enable/disable the job Schedule: "0 5 0 * * *" # Cron expression (default: 00:05 UTC daily) SourceType: "online" # "online" or "bundle" BundlePath: null # Path for air-gapped bundle import InitialDelay: "00:00:30" # Wait before first run (30s) RetryDelay: "00:05:00" # Delay between retries (5m) MaxRetries: 3 # Maximum retry attempts ``` ### 10.2 Online Mode (Connected) The job automatically fetches EPSS data from FIRST.org at the scheduled time: 1. Downloads `https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz` 2. Validates SHA256 hash 3. Parses CSV and bulk inserts to `epss_scores` 4. Computes delta against `epss_current` 5. Updates `epss_current` projection 6. Publishes `epss.updated` event ### 10.3 Air-Gap Mode (Bundle) For offline deployments: 1. Download EPSS CSV from FIRST.org on an internet-connected system 2. Copy to the configured `BundlePath` location 3. Set `SourceType: "bundle"` in configuration 4. The job will read from the local file instead of fetching online ### 10.4 Manual Ingestion Trigger manual ingestion via the Scanner Worker API: ```bash # POST to trigger immediate ingestion for a specific date curl -X POST "https://scanner-worker/epss/ingest?date=2025-12-18" ``` ### 10.5 Troubleshooting | Symptom | Likely Cause | Resolution | |---------|--------------|------------| | Job not running | `Enabled: false` | Set `Enabled: true` | | Download fails | Network/firewall | Check HTTPS egress to `epss.empiricalsecurity.com` | | Parse errors | Corrupted file | Re-download, check SHA256 | | Slow ingestion | Large dataset | Normal for ~250k rows; expect 60-90s | | Duplicate runs | Idempotent | Safe - existing data preserved | ### 10.6 Monitoring Key metrics and traces: - **Activity**: `StellaOps.Scanner.EpssIngest` with tags: - `epss.model_date`: Date of EPSS model - `epss.row_count`: Number of rows ingested - `epss.cve_count`: Distinct CVEs processed - `epss.duration_ms`: Total ingestion time - **Logs**: Structured logs at Info/Warning/Error levels - `EPSS ingest job started` - `Starting EPSS ingestion for {ModelDate}` - `EPSS ingestion completed: modelDate={ModelDate}, rows={RowCount}...` --- ## 11. Anti-Patterns to Avoid | Anti-Pattern | Why It's Wrong | |--------------|----------------| | Storing only latest EPSS | Breaks auditability and replay | | Mixing EPSS into CVE table | EPSS is signal, not vulnerability data | | Treating EPSS as severity | EPSS is probability, not impact | | Alerting on every daily fluctuation | Creates alert fatigue | | Recomputing EPSS internally | Use FIRST's authoritative data | --- ## Related Documents - [Unknowns API Documentation](../api/unknowns-api.md) - [Score Replay API](../api/score-replay-api.md) - [Trust Lattice Architecture](../modules/scanner/architecture.md)