git.stella-ops.org/docs/operations/unknowns-queue-runbook.md

# Unknowns Queue Management Runbook

> **Version**: 1.0.0
> **Sprint**: 3500.0004.0004
> **Last Updated**: 2025-12-20

This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.

---

## Table of Contents

1. [Overview](#1-overview)
2. [Queue Operations](#2-queue-operations)
3. [Triage Procedures](#3-triage-procedures)
4. [Escalation Workflows](#4-escalation-workflows)
5. [Resolution Procedures](#5-resolution-procedures)
6. [Troubleshooting](#6-troubleshooting)
7. [Monitoring & Alerting](#7-monitoring--alerting)

---

## 1. Overview

### What are Unknowns?

Unknowns are items that could not be fully classified during scanning due to:

- Missing VEX statements
- Ambiguous indirect calls in call graphs
- Incomplete SBOM data
- Missing advisory information
- Conflicting evidence from multiple sources

### Unknown Ranking

Unknowns are ranked using a 2-factor scoring model:

```
score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
```

| Factor | Weight | Description |
|--------|--------|-------------|
| Blast Radius | 0.60 | Impact scope (dependents, network exposure) |
| Evidence Scarcity | 0.30 | How much data is missing |
| Exploit Pressure | 0.30 | EPSS score, KEV status |
| Containment | -0.20 | Mitigation factors (seccomp, read-only FS) |

### Band Assignment

| Band | Score Range | Priority | SLA |
|------|-------------|----------|-----|
| HOT | ≥ 0.70 | Critical | 24 hours |
| WARM | 0.40 - 0.69 | Normal | 7 days |
| COLD | < 0.40 | Low | 30 days |

---

## 2. Queue Operations

### 2.1 View Queue Status

```bash
# Get queue summary
stella unknowns summary

# Output:
# Total: 142 unknowns
# HOT:  12 (8%)  - Requires immediate attention
# WARM: 85 (60%) - Normal priority
# COLD: 45 (32%) - Low priority
#
# KEV items: 3
# Average score: 0.52

# Get queue summary via API
curl "https://scanner.example.com/api/v1/unknowns/summary" \
  -H "Authorization: Bearer $TOKEN"
```

### 2.2 List Unknowns

```bash
# List all HOT unknowns
stella unknowns list --band HOT

# List by score (highest first)
stella unknowns list --sort score --order desc --limit 20

# Filter by reason
stella unknowns list --reason missing_vex

# Filter by artifact
stella unknowns list --artifact sha256:abc123...

# Filter by KEV status
stella unknowns list --kev true
```

### 2.3 View Unknown Details

```bash
# Get detailed view
stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456

# Output:
# ID: unk-12345678-...
# Artifact: pkg:oci/myapp@sha256:abc123
# Reasons: [missing_vex, ambiguous_indirect_call]
#
# Blast Radius:
#   Dependents: 15 services
#   Network: internet-facing
#   Privilege: user
#
# Evidence Scarcity: 0.7 (high)
#
# Exploit Pressure:
#   EPSS: 0.45
#   KEV: false
#
# Containment:
#   Seccomp: enforced (-0.10)
#   Filesystem: read-only (-0.10)
#
# Score: 0.62 (WARM band)
# Score Breakdown:
#   Blast component: +0.35
#   Scarcity component: +0.21
#   Pressure component: +0.26
#   Containment deduction: -0.20

# Show proof tree
stella unknowns proof unk-12345678-...
```

### 2.4 Export Queue Data

```bash
# Export for analysis
stella unknowns export --format json --output unknowns.json

# Export HOT items for daily review
stella unknowns export --band HOT --format csv --output hot-unknowns.csv

# Export with full details
stella unknowns export --verbose --include-proofs --output full-export.json
```

---

## 3. Triage Procedures

### 3.1 Daily Triage Workflow

**Schedule**: Daily at 9:00 AM

**Duration**: 30 minutes

**Participants**: Security analyst, on-call engineer

**Process**:

```bash
# 1. Get today's queue snapshot
stella unknowns snapshot --output daily-$(date +%Y%m%d).json

# 2. Review all HOT items
stella unknowns list --band HOT --since 24h

# 3. For each HOT unknown, determine action:
#    - Escalate: Trigger immediate rescan
#    - Investigate: Needs manual analysis
#    - Defer: Move to WARM (with justification)
#    - Resolve: Evidence found, can close

# 4. Process each item
stella unknowns triage unk-12345678-... --action escalate
stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"
```

### 3.2 Triage Decision Matrix

| Reason Code | KEV | EPSS > 0.5 | Action |
|-------------|-----|------------|--------|
| `missing_vex` | Yes | Any | Escalate + Vendor outreach |
| `missing_vex` | No | Yes | Escalate |
| `missing_vex` | No | No | Request VEX |
| `ambiguous_indirect_call` | Any | Any | Manual code review |
| `incomplete_sbom` | Any | Any | Rescan with updated extractor |
| `conflicting_evidence` | Any | Any | Manual analysis |

### 3.3 Triage Templates

```bash
# Quick escalate (HOT + KEV)
stella unknowns triage unk-... --action escalate \
  --priority P1 \
  --notes "KEV item, requires immediate attention"

# Request vendor VEX
stella unknowns triage unk-... --action investigate \
  --notes "Requested VEX from vendor via security@vendor.com" \
  --due-date 7d

# Mark for code review
stella unknowns triage unk-... --action investigate \
  --notes "Requires manual code review to resolve indirect call" \
  --assign @code-review-team

# Defer with justification
stella unknowns triage unk-... --action defer \
  --reason "Component not deployed to production" \
  --evidence "deployment-manifest.yaml shows staging-only"
```

---

## 4. Escalation Workflows

### 4.1 Automatic Escalation

Unknowns are automatically escalated when:

- Score increases above HOT threshold (0.70)
- KEV status added to related CVE
- EPSS score increases significantly (> 0.2 delta)
- Blast radius increases (new dependents detected)

**Configure auto-escalation**:

```yaml
# policy.unknowns.escalation.yaml
autoEscalation:
  enabled: true
  triggers:
    - condition: score >= 0.70
      action: escalate
      notify: [security-team]
    - condition: kev == true
      action: escalate
      priority: P1
      notify: [security-team, management]
    - condition: epss_delta > 0.2
      action: escalate
      notify: [security-team]
```

### 4.2 Manual Escalation

```bash
# Escalate via CLI
stella unknowns escalate unk-12345678-...

# Escalate with reason
stella unknowns escalate unk-12345678-... \
  --reason "Customer reported potential exploit"

# Escalate to trigger rescan
stella unknowns escalate unk-12345678-... --rescan

# Output:
# Escalated: unk-12345678-...
# Rescan job: rescan-job-001
# Status: queued
# ETA: 5 minutes
```

### 4.3 Bulk Escalation

```bash
# Escalate all KEV items
stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"

# Escalate high-score items
stella unknowns escalate --filter "score>=0.8" --rescan

# Escalate by artifact
stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"
```

### 4.4 Escalation SLA Tracking

```bash
# Check SLA status
stella unknowns sla-status

# Output:
# HOT unknowns SLA (24h):
#   In SLA: 10 (83%)
#   Breached: 2 (17%)
#
# Breached items:
#   unk-111... (26h old) - missing_vex
#   unk-222... (30h old) - conflicting_evidence

# Get SLA breach notifications
stella unknowns list --sla-breached
```

---

## 5. Resolution Procedures

### 5.1 Resolution Types

| Resolution | Description | Evidence Required |
|------------|-------------|-------------------|
| `not_affected` | Vulnerability doesn't apply | VEX statement or manual analysis |
| `fixed` | Vulnerability patched | Version upgrade confirmation |
| `mitigated` | Controls in place | Mitigation documentation |
| `false_positive` | Incorrect classification | Analysis report |
| `wont_fix` | Accepted risk | Risk acceptance form |

### 5.2 Resolve Unknown

```bash
# Resolve as not affected
stella unknowns resolve unk-12345678-... \
  --resolution not_affected \
  --justification "vulnerable_code_not_present" \
  --notes "Manual code review confirmed function not used"

# Resolve as fixed
stella unknowns resolve unk-12345678-... \
  --resolution fixed \
  --justification "version_upgraded" \
  --evidence "Upgraded lodash to 4.17.21, CVE patched"

# Resolve as mitigated
stella unknowns resolve unk-12345678-... \
  --resolution mitigated \
  --justification "inline_mitigations_exist" \
  --evidence "WAF rule WAF-001 blocks exploit pattern"

# Resolve as won't fix (risk accepted)
stella unknowns resolve unk-12345678-... \
  --resolution wont_fix \
  --justification "risk_accepted" \
  --evidence "Risk acceptance ticket RISK-123" \
  --expires 90d  # Re-evaluate in 90 days
```

### 5.3 Bulk Resolution

```bash
# Resolve all items for a fixed package version
stella unknowns resolve-batch \
  --filter "purl=pkg:npm/lodash@4.17.20" \
  --resolution fixed \
  --justification "Upgraded to 4.17.21 fleet-wide" \
  --evidence "Fleet upgrade ticket FLEET-456"

# Resolve false positives from analysis
stella unknowns resolve-batch \
  --file false-positives.json \
  --resolution false_positive
```

### 5.4 Resolution Audit Trail

```bash
# View resolution history
stella unknowns history unk-12345678-...

# Output:
# 2025-12-15 10:00:00 - Created (score: 0.62)
# 2025-12-16 09:30:00 - Triaged by analyst@example.com
# 2025-12-17 14:00:00 - Escalated (KEV added)
# 2025-12-18 11:00:00 - Resolved by security@example.com
#   Resolution: not_affected
#   Justification: vulnerable_code_not_present
#   Notes: Manual code review confirmed function not used

# Export audit trail
stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json
```

---

## 6. Troubleshooting

### 6.1 Score Seems Wrong

**Symptom**: Unknown scored too high or too low.

**Diagnosis**:

```bash
# View score breakdown
stella unknowns show unk-... --score-details

# View proof tree
stella unknowns proof unk-... --verbose
```

**Common causes**:

1. **Stale EPSS data**: EPSS feed not updated
2. **Incorrect blast radius**: Dependency data outdated
3. **Missing containment data**: Seccomp/filesystem status unknown

**Resolution**:

```bash
# Trigger score recalculation
stella unknowns recalculate unk-...

# Force refresh of all input signals
stella unknowns refresh unk-... --force
```

### 6.2 Duplicate Unknowns

**Symptom**: Same issue appears multiple times.

**Diagnosis**:

```bash
# Find potential duplicates
stella unknowns duplicates --scan

# Output shows items with same CVE+PURL but different artifacts
```

**Resolution**:

```bash
# Merge duplicates
stella unknowns merge \
  --primary unk-111... \
  --secondary unk-222... \
  --reason "Same CVE across artifact versions"
```

### 6.3 Escalation Not Working

**Symptom**: Escalation doesn't trigger rescan.

**Diagnosis**:

```bash
# Check escalation status
stella unknowns escalation-status unk-...

# Check Scheduler connectivity
stella health check --service scheduler

# Check job queue
stella scheduler queue status rescan
```

**Resolution**:

```bash
# Retry escalation
stella unknowns escalate unk-... --force

# Manual rescan trigger
stella scan trigger --artifact sha256:abc123... --priority high
```

### 6.4 Resolution Rejected

**Symptom**: Resolution attempt fails validation.

**Diagnosis**:

```bash
# Check resolution requirements
stella unknowns resolution-requirements unk-...

# Output:
# Resolution requirements for unk-12345678-...
# - Justification: required
# - Evidence: required (reason: KEV item)
# - Approver: required (band: HOT)
```

**Resolution**:

```bash
# Provide required evidence
stella unknowns resolve unk-... \
  --resolution not_affected \
  --justification "vulnerable_code_not_present" \
  --evidence "Code review: CRV-123" \
  --approver security-lead@example.com
```

---

## 7. Monitoring & Alerting

### 7.1 Key Metrics

| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `unknowns_total` | Total unknowns in queue | > 500 |
| `unknowns_hot_count` | HOT band count | > 20 |
| `unknowns_sla_breached` | SLA breaches | > 0 |
| `unknowns_resolution_rate` | Daily resolutions | < 5 |
| `unknowns_escalation_failures` | Failed escalations | > 0 |
| `unknowns_avg_age_hours` | Average unknown age | > 168 (1 week) |

### 7.2 Grafana Dashboard

```
Dashboard: Unknowns Queue Health
Panels:
- Queue size by band (HOT/WARM/COLD)
- SLA compliance rate
- Unknowns by reason code
- Resolution velocity
- Escalation success rate
- Queue age distribution
- KEV item tracking
```

### 7.3 Alerting Rules

```yaml
groups:
  - name: unknowns-queue
    rules:
      - alert: UnknownsHotBandHigh
        expr: unknowns_hot_count > 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HOT unknowns queue is high ({{ $value }} items)"

      - alert: UnknownsSLABreach
        expr: unknowns_sla_breached > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "{{ $value }} unknowns have breached SLA"

      - alert: UnknownsQueueGrowing
        expr: rate(unknowns_total[1h]) > 10
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Unknowns queue is growing rapidly"

      - alert: UnknownsKEVPending
        expr: unknowns_kev_count > 0 and unknowns_kev_unresolved_age_hours > 24
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "KEV unknown pending for over 24 hours"
```

### 7.4 Daily Report

```bash
# Generate daily report
stella unknowns report --format email --send-to security-team@example.com

# Report includes:
# - Queue summary (total, by band, by reason)
# - SLA status (in compliance, breaches)
# - Top 10 highest-scored items
# - Newly added items (last 24h)
# - Resolved items (last 24h)
# - KEV item status
# - Trends (7-day, 30-day)
```

---

## Related Documentation

- [Unknowns API Reference](../api/score-proofs-reachability-api-reference.md#5-unknowns-api)
- [Triage Technical Reference](../product-advisories/14-Dec-2025%20-%20Triage%20and%20Unknowns%20Technical%20Reference.md)
- [Score Proofs Runbook](./score-proofs-runbook.md)
- [Policy Engine](../modules/policy/architecture.md)

---

**Last Updated**: 2025-12-20
**Version**: 1.0.0
**Sprint**: 3500.0004.0004