Files
git.stella-ops.org/docs/modules/scanner/epss-integration.md
master 8bbfe4d2d2 feat(rate-limiting): Implement core rate limiting functionality with configuration, decision-making, metrics, middleware, and service registration
- Add RateLimitConfig for configuration management with YAML binding support.
- Introduce RateLimitDecision to encapsulate the result of rate limit checks.
- Implement RateLimitMetrics for OpenTelemetry metrics tracking.
- Create RateLimitMiddleware for enforcing rate limits on incoming requests.
- Develop RateLimitService to orchestrate instance and environment rate limit checks.
- Add RateLimitServiceCollectionExtensions for dependency injection registration.
2025-12-17 18:02:37 +02:00

12 KiB
Raw Blame History

EPSS Integration Architecture

Advisory Source: docs/product-advisories/16-Dec-2025 - Merging EPSS v4 with CVSS v4 Frameworks.md
Last Updated: 2025-12-17
Status: Approved for Implementation


Executive Summary

EPSS (Exploit Prediction Scoring System) is a probabilistic model that estimates the likelihood a given CVE will be exploited in the wild over the next ~30 days. This document defines how StellaOps integrates EPSS as a first-class risk signal.

Key Distinction:

  • CVSS v4: Deterministic measurement of severity (0-10)
  • EPSS: Dynamic, data-driven probability of exploitation (0-1)

EPSS does not replace CVSS or VEX—it provides complementary probabilistic threat intelligence.


1. Design Principles

1.1 EPSS as Probabilistic Signal

Signal Type Nature Source
CVSS v4 Deterministic impact NVD, vendor
EPSS Probabilistic threat FIRST daily feeds
VEX Vendor intent Vendor statements
Runtime context Actual exposure StellaOps scanner

Rule: EPSS modulates confidence, never asserts truth.

1.2 Architectural Constraints

  1. Append-only time-series: Never overwrite historical EPSS data
  2. Deterministic replay: Every scan stores the EPSS snapshot reference used
  3. Idempotent ingestion: Safe to re-run for same date
  4. Postgres as source of truth: Valkey is optional cache only
  5. Air-gap compatible: Manual import via signed bundles

2. Data Model

2.1 Core Tables

Import Provenance

CREATE TABLE epss_import_runs (
  import_run_id      UUID PRIMARY KEY,
  model_date         DATE NOT NULL,
  source_uri         TEXT NOT NULL,
  retrieved_at       TIMESTAMPTZ NOT NULL,
  file_sha256        TEXT NOT NULL,
  decompressed_sha256 TEXT NULL,
  row_count          INT NOT NULL,
  model_version_tag  TEXT NULL,
  published_date     DATE NULL,
  status             TEXT NOT NULL,  -- SUCCEEDED / FAILED
  error              TEXT NULL,
  UNIQUE (model_date)
);

Time-Series Scores (Partitioned)

CREATE TABLE epss_scores (
  model_date    DATE NOT NULL,
  cve_id        TEXT NOT NULL,
  epss_score    DOUBLE PRECISION NOT NULL,
  percentile    DOUBLE PRECISION NOT NULL,
  import_run_id UUID NOT NULL REFERENCES epss_import_runs(import_run_id),
  PRIMARY KEY (model_date, cve_id)
) PARTITION BY RANGE (model_date);

Current Projection (Fast Lookup)

CREATE TABLE epss_current (
  cve_id        TEXT PRIMARY KEY,
  epss_score    DOUBLE PRECISION NOT NULL,
  percentile    DOUBLE PRECISION NOT NULL,
  model_date    DATE NOT NULL,
  import_run_id UUID NOT NULL
);

CREATE INDEX idx_epss_current_score_desc ON epss_current (epss_score DESC);
CREATE INDEX idx_epss_current_percentile_desc ON epss_current (percentile DESC);

Change Detection

CREATE TABLE epss_changes (
  model_date     DATE NOT NULL,
  cve_id         TEXT NOT NULL,
  old_score      DOUBLE PRECISION NULL,
  new_score      DOUBLE PRECISION NOT NULL,
  delta_score    DOUBLE PRECISION NULL,
  old_percentile DOUBLE PRECISION NULL,
  new_percentile DOUBLE PRECISION NOT NULL,
  flags          INT NOT NULL,  -- bitmask: NEW_SCORED, CROSSED_HIGH, BIG_JUMP
  PRIMARY KEY (model_date, cve_id)
) PARTITION BY RANGE (model_date);

2.2 Flags Bitmask

Flag Value Meaning
NEW_SCORED 0x01 CVE newly scored (not in previous day)
CROSSED_HIGH 0x02 Score crossed above high threshold
CROSSED_LOW 0x04 Score crossed below high threshold
BIG_JUMP_UP 0x08 Delta > 0.10 upward
BIG_JUMP_DOWN 0x10 Delta > 0.10 downward
TOP_PERCENTILE 0x20 Entered top 5%

3. Service Architecture

3.1 Component Responsibilities

┌─────────────────────────────────────────────────────────────────┐
│                    EPSS DATA FLOW                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐    │
│  │  Scheduler   │────►│  Concelier   │────►│   Scanner    │    │
│  │  (triggers)  │     │  (ingest)    │     │  (evidence)  │    │
│  └──────────────┘     └──────────────┘     └──────────────┘    │
│         │                    │                    │            │
│         │                    ▼                    │            │
│         │             ┌──────────────┐            │            │
│         │             │   Postgres   │◄───────────┘            │
│         │             │  (truth)     │                         │
│         │             └──────────────┘                         │
│         │                    │                                 │
│         ▼                    ▼                                 │
│  ┌──────────────┐     ┌──────────────┐                         │
│  │   Notify     │◄────│  Excititor   │                         │
│  │  (alerts)    │     │  (VEX tasks) │                         │
│  └──────────────┘     └──────────────┘                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
Component Responsibility
Scheduler Triggers daily EPSS import job
Concelier Downloads/imports EPSS, stores facts, computes delta, emits events
Scanner Attaches EPSS-at-scan as immutable evidence, uses for scoring
Excititor Creates VEX tasks when EPSS is high and VEX missing
Notify Sends alerts on priority changes

3.2 Event Flow

Scheduler
  → epss.ingest(date)
    → Concelier (ingest)
      → epss.updated
        → Notify (optional daily summary)
      → Concelier (enrichment)
        → vuln.priority.changed
          → Notify (targeted alerts)
          → Excititor (VEX task creation)

4. Ingestion Pipeline

4.1 Data Source

FIRST publishes daily CSV snapshots at:

https://epss.empiricalsecurity.com/epss_scores-YYYY-MM-DD.csv.gz

Each file contains ~300k CVE records with:

  • cve - CVE ID
  • epss - Score (0.000001.00000)
  • percentile - Rank vs all CVEs

4.2 Ingestion Steps

  1. Scheduler triggers daily job for date D
  2. Download epss_scores-D.csv.gz
  3. Decompress stream
  4. Parse header comment for model version/date
  5. Validate scores in [0,1], monotonic percentile
  6. Bulk load into TEMP staging table
  7. Transaction:
    • Insert epss_import_runs
    • Insert into epss_scores partition
    • Compute epss_changes by comparing staging vs epss_current
    • Upsert epss_current
    • Enqueue epss.updated event
  8. Commit

4.3 Air-Gap Import

Accept local bundle containing:

  • epss_scores-YYYY-MM-DD.csv.gz
  • manifest.json with sha256, source attribution, DSSE signature

Same pipeline, with source_uri = bundle://....


5. Enrichment Rules

5.1 New Scan Findings (Immutable)

Store EPSS "as-of" scan time:

public record ScanEpssEvidence
{
    public double EpssScoreAtScan { get; init; }
    public double EpssPercentileAtScan { get; init; }
    public DateOnly EpssModelDateAtScan { get; init; }
    public Guid EpssImportRunIdAtScan { get; init; }
}

This supports deterministic replay even if EPSS changes later.

5.2 Existing Findings (Live Triage)

Maintain mutable "current EPSS" on vulnerability instances:

  • scan_finding_evidence: Immutable EPSS-at-scan
  • vuln_instance_triage: Current EPSS + band (for live triage)

5.3 Efficient Delta Targeting

On epss.updated(D):

  1. Read epss_changes where flags indicate material change
  2. Find impacted vulnerability instances by CVE
  3. Update only those instances
  4. Emit vuln.priority.changed only if band crossed

6. Notification Policy

6.1 Default Thresholds

Threshold Default Description
HighPercentile 0.95 Top 5% of all CVEs
HighScore 0.50 50% exploitation probability
BigJumpDelta 0.10 Meaningful daily change

6.2 Trigger Conditions

  1. Newly scored CVE in inventory AND percentile >= HighPercentile
  2. Existing CVE crosses above HighPercentile or HighScore
  3. Delta > BigJumpDelta AND CVE in runtime-exposed assets

All thresholds are org-configurable.


7. Trust Lattice Integration

7.1 Scoring Rule Example

IF cvss_base >= 8.0
AND epss_score >= 0.35
AND runtime_exposed = true
→ priority = IMMEDIATE_ATTENTION

7.2 Score Weights

Factor Default Weight Range
CVSS 0.25 0.0-1.0
EPSS 0.25 0.0-1.0
Reachability 0.25 0.0-1.0
Freshness 0.15 0.0-1.0
Frequency 0.10 0.0-1.0

8. API Surface

8.1 Internal API Endpoints

Endpoint Description
GET /epss/current?cve=... Bulk lookup current EPSS
GET /epss/history?cve=...&days=180 Historical time-series
GET /epss/top?order=epss&limit=100 Top CVEs by score
GET /epss/changes?date=... Daily change report

8.2 UI Requirements

For each vulnerability instance:

  • EPSS score + percentile
  • Model date
  • Trend delta vs previous scan date
  • Filter chips: "High EPSS", "Rising EPSS", "High CVSS + High EPSS"
  • Evidence panel showing EPSS-at-scan vs current EPSS

9. Implementation Checklist

Phase 1: Data Foundation

  • DB migrations: tables + partitions + indexes
  • Concelier ingestion job: online download + bundle import

Phase 2: Integration

  • epss_current + epss_changes projection
  • Scanner.WebService: attach EPSS-at-scan evidence
  • Bulk lookup API

Phase 3: Enrichment

  • Concelier enrichment job: update triage projections
  • Notify subscription to vuln.priority.changed

Phase 4: UI/UX

  • EPSS fields in vulnerability detail
  • Filters and sort by exploit likelihood
  • Trend visualization

Phase 5: Operations

  • Backfill tool (last 180 days)
  • Ops runbook: schedules, manual re-run, air-gap import

10. Anti-Patterns to Avoid

Anti-Pattern Why It's Wrong
Storing only latest EPSS Breaks auditability and replay
Mixing EPSS into CVE table EPSS is signal, not vulnerability data
Treating EPSS as severity EPSS is probability, not impact
Alerting on every daily fluctuation Creates alert fatigue
Recomputing EPSS internally Use FIRST's authoritative data