git.stella-ops.org/docs/metrics/fn-drift.md

# FN-Drift Metrics Reference

> **Sprint:** SPRINT_3404_0001_0001
> **Module:** Scanner Storage / Telemetry

## Overview

False-Negative Drift (FN-Drift) measures how often vulnerability classifications change from "not affected" or "unknown" to "affected" during rescans. This metric is critical for:

- **Accuracy Assessment**: Tracking scanner reliability over time
- **SLO Compliance**: Meeting false-negative rate targets
- **Root Cause Analysis**: Stratified analysis by drift cause
- **Feed Quality**: Identifying problematic vulnerability feeds

## Metrics

### Gauges (30-day rolling window)

| Metric | Type | Description |
|--------|------|-------------|
| `scanner.fn_drift.percent` | Gauge | 30-day rolling FN-Drift percentage |
| `scanner.fn_drift.transitions_30d` | Gauge | Total FN transitions in last 30 days |
| `scanner.fn_drift.evaluated_30d` | Gauge | Total findings evaluated in last 30 days |
| `scanner.fn_drift.cause.feed_delta` | Gauge | FN transitions caused by feed updates |
| `scanner.fn_drift.cause.rule_delta` | Gauge | FN transitions caused by rule changes |
| `scanner.fn_drift.cause.lattice_delta` | Gauge | FN transitions caused by VEX lattice changes |
| `scanner.fn_drift.cause.reachability_delta` | Gauge | FN transitions caused by reachability changes |
| `scanner.fn_drift.cause.engine` | Gauge | FN transitions caused by engine changes (should be ~0) |

### Counters (all-time)

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `scanner.classification_changes_total` | Counter | `cause` | Total classification status changes |
| `scanner.fn_transitions_total` | Counter | `cause` | Total false-negative transitions |

## Classification Statuses

| Status | Description |
|--------|-------------|
| `new` | First scan, no previous status |
| `unaffected` | Confirmed not affected |
| `unknown` | Status unknown/uncertain |
| `affected` | Confirmed affected |
| `fixed` | Previously affected, now fixed |

## Drift Causes

| Cause | Description | Expected Impact |
|-------|-------------|-----------------|
| `feed_delta` | Vulnerability feed updated (NVD, GHSA, OVAL) | High - most common cause |
| `rule_delta` | Policy rules changed | Medium - controlled by policy team |
| `lattice_delta` | VEX lattice state changed | Medium - VEX updates |
| `reachability_delta` | Reachability analysis changed | Low - improved analysis |
| `engine` | Scanner engine change | ~0 - determinism violation if >0 |
| `other` | Unknown/unclassified cause | Low - investigate if high |

## FN-Drift Definition

A **False-Negative Transition** occurs when:
- Previous status was `unaffected` or `unknown`
- New status is `affected`

This indicates the scanner previously classified a finding as "not vulnerable" but now classifies it as "vulnerable" - a false negative in the earlier scan.

### FN-Drift Rate Calculation

```
FN-Drift % = (FN Transitions / Total Reclassified) × 100
```

Where:
- **FN Transitions**: Count of `(unaffected|unknown) → affected` changes
- **Total Reclassified**: Count of all status changes (excluding `new`)

## SLO Thresholds

| SLO Level | FN-Drift Threshold | Alert Severity |
|-----------|-------------------|----------------|
| Target | < 1.0% | None |
| Warning | 1.0% - 2.5% | Warning |
| Critical | > 2.5% | Critical |
| Engine Drift | > 0% | Page |

### Alerting Rules

```yaml
# Example Prometheus alerting rules
groups:
  - name: fn-drift
    rules:
      - alert: FnDriftWarning
        expr: scanner_fn_drift_percent > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "FN-Drift rate above warning threshold"

      - alert: FnDriftCritical
        expr: scanner_fn_drift_percent > 2.5
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "FN-Drift rate above critical threshold"

      - alert: EngineDriftDetected
        expr: scanner_fn_drift_cause_engine > 0
        for: 1m
        labels:
          severity: page
        annotations:
          summary: "Engine-caused FN drift detected - determinism violation"
```

## Dashboard Queries

### FN-Drift Trend (Grafana)

```promql
# 30-day rolling FN-Drift percentage
scanner_fn_drift_percent

# FN transitions by cause
sum by (cause) (rate(scanner_fn_transitions_total[1h]))

# Classification changes rate
sum by (cause) (rate(scanner_classification_changes_total[1h]))
```

### Drift Cause Breakdown

```promql
# Pie chart of drift causes
topk(5,
  sum by (cause) (
    increase(scanner_fn_transitions_total[24h])
  )
)
```

## Database Schema

### classification_history Table

```sql
CREATE TABLE scanner.classification_history (
    id BIGSERIAL PRIMARY KEY,
    artifact_digest TEXT NOT NULL,
    vuln_id TEXT NOT NULL,
    package_purl TEXT NOT NULL,
    tenant_id UUID NOT NULL,
    manifest_id UUID NOT NULL,
    execution_id UUID NOT NULL,
    previous_status TEXT NOT NULL,
    new_status TEXT NOT NULL,
    is_fn_transition BOOLEAN GENERATED ALWAYS AS (...) STORED,
    cause TEXT NOT NULL,
    cause_detail JSONB,
    changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

### fn_drift_stats Materialized View

Aggregated daily statistics for efficient dashboard queries:
- Day bucket
- Tenant ID
- Cause breakdown
- FN count and percentage

## Related Documentation

- [Determinism Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md) - Section 13.2
- [Scanner Architecture](../modules/scanner/architecture.md)
- [Telemetry Stack](../modules/telemetry/architecture.md)