Files
git.stella-ops.org/docs/operations/unknowns-queue-runbook.md
StellaOps Bot da315965ff feat: Add operations runbooks and UI API models for Sprint 3500.0004.x
Operations documentation:
- docs/operations/reachability-runbook.md - Reachability troubleshooting guide
- docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide

UI TypeScript models:
- src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types
- src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types
- src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types

Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
2025-12-20 22:22:09 +02:00

591 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Unknowns Queue Management Runbook
> **Version**: 1.0.0
> **Sprint**: 3500.0004.0004
> **Last Updated**: 2025-12-20
This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.
---
## Table of Contents
1. [Overview](#1-overview)
2. [Queue Operations](#2-queue-operations)
3. [Triage Procedures](#3-triage-procedures)
4. [Escalation Workflows](#4-escalation-workflows)
5. [Resolution Procedures](#5-resolution-procedures)
6. [Troubleshooting](#6-troubleshooting)
7. [Monitoring & Alerting](#7-monitoring--alerting)
---
## 1. Overview
### What are Unknowns?
Unknowns are items that could not be fully classified during scanning due to:
- Missing VEX statements
- Ambiguous indirect calls in call graphs
- Incomplete SBOM data
- Missing advisory information
- Conflicting evidence from multiple sources
### Unknown Ranking
Unknowns are ranked using a 2-factor scoring model:
```
score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
```
| Factor | Weight | Description |
|--------|--------|-------------|
| Blast Radius | 0.60 | Impact scope (dependents, network exposure) |
| Evidence Scarcity | 0.30 | How much data is missing |
| Exploit Pressure | 0.30 | EPSS score, KEV status |
| Containment | -0.20 | Mitigation factors (seccomp, read-only FS) |
### Band Assignment
| Band | Score Range | Priority | SLA |
|------|-------------|----------|-----|
| HOT | ≥ 0.70 | Critical | 24 hours |
| WARM | 0.40 - 0.69 | Normal | 7 days |
| COLD | < 0.40 | Low | 30 days |
---
## 2. Queue Operations
### 2.1 View Queue Status
```bash
# Get queue summary
stella unknowns summary
# Output:
# Total: 142 unknowns
# HOT: 12 (8%) - Requires immediate attention
# WARM: 85 (60%) - Normal priority
# COLD: 45 (32%) - Low priority
#
# KEV items: 3
# Average score: 0.52
# Get queue summary via API
curl "https://scanner.example.com/api/v1/unknowns/summary" \
-H "Authorization: Bearer $TOKEN"
```
### 2.2 List Unknowns
```bash
# List all HOT unknowns
stella unknowns list --band HOT
# List by score (highest first)
stella unknowns list --sort score --order desc --limit 20
# Filter by reason
stella unknowns list --reason missing_vex
# Filter by artifact
stella unknowns list --artifact sha256:abc123...
# Filter by KEV status
stella unknowns list --kev true
```
### 2.3 View Unknown Details
```bash
# Get detailed view
stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456
# Output:
# ID: unk-12345678-...
# Artifact: pkg:oci/myapp@sha256:abc123
# Reasons: [missing_vex, ambiguous_indirect_call]
#
# Blast Radius:
# Dependents: 15 services
# Network: internet-facing
# Privilege: user
#
# Evidence Scarcity: 0.7 (high)
#
# Exploit Pressure:
# EPSS: 0.45
# KEV: false
#
# Containment:
# Seccomp: enforced (-0.10)
# Filesystem: read-only (-0.10)
#
# Score: 0.62 (WARM band)
# Score Breakdown:
# Blast component: +0.35
# Scarcity component: +0.21
# Pressure component: +0.26
# Containment deduction: -0.20
# Show proof tree
stella unknowns proof unk-12345678-...
```
### 2.4 Export Queue Data
```bash
# Export for analysis
stella unknowns export --format json --output unknowns.json
# Export HOT items for daily review
stella unknowns export --band HOT --format csv --output hot-unknowns.csv
# Export with full details
stella unknowns export --verbose --include-proofs --output full-export.json
```
---
## 3. Triage Procedures
### 3.1 Daily Triage Workflow
**Schedule**: Daily at 9:00 AM
**Duration**: 30 minutes
**Participants**: Security analyst, on-call engineer
**Process**:
```bash
# 1. Get today's queue snapshot
stella unknowns snapshot --output daily-$(date +%Y%m%d).json
# 2. Review all HOT items
stella unknowns list --band HOT --since 24h
# 3. For each HOT unknown, determine action:
# - Escalate: Trigger immediate rescan
# - Investigate: Needs manual analysis
# - Defer: Move to WARM (with justification)
# - Resolve: Evidence found, can close
# 4. Process each item
stella unknowns triage unk-12345678-... --action escalate
stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"
```
### 3.2 Triage Decision Matrix
| Reason Code | KEV | EPSS > 0.5 | Action |
|-------------|-----|------------|--------|
| `missing_vex` | Yes | Any | Escalate + Vendor outreach |
| `missing_vex` | No | Yes | Escalate |
| `missing_vex` | No | No | Request VEX |
| `ambiguous_indirect_call` | Any | Any | Manual code review |
| `incomplete_sbom` | Any | Any | Rescan with updated extractor |
| `conflicting_evidence` | Any | Any | Manual analysis |
### 3.3 Triage Templates
```bash
# Quick escalate (HOT + KEV)
stella unknowns triage unk-... --action escalate \
--priority P1 \
--notes "KEV item, requires immediate attention"
# Request vendor VEX
stella unknowns triage unk-... --action investigate \
--notes "Requested VEX from vendor via security@vendor.com" \
--due-date 7d
# Mark for code review
stella unknowns triage unk-... --action investigate \
--notes "Requires manual code review to resolve indirect call" \
--assign @code-review-team
# Defer with justification
stella unknowns triage unk-... --action defer \
--reason "Component not deployed to production" \
--evidence "deployment-manifest.yaml shows staging-only"
```
---
## 4. Escalation Workflows
### 4.1 Automatic Escalation
Unknowns are automatically escalated when:
- Score increases above HOT threshold (0.70)
- KEV status added to related CVE
- EPSS score increases significantly (> 0.2 delta)
- Blast radius increases (new dependents detected)
**Configure auto-escalation**:
```yaml
# policy.unknowns.escalation.yaml
autoEscalation:
enabled: true
triggers:
- condition: score >= 0.70
action: escalate
notify: [security-team]
- condition: kev == true
action: escalate
priority: P1
notify: [security-team, management]
- condition: epss_delta > 0.2
action: escalate
notify: [security-team]
```
### 4.2 Manual Escalation
```bash
# Escalate via CLI
stella unknowns escalate unk-12345678-...
# Escalate with reason
stella unknowns escalate unk-12345678-... \
--reason "Customer reported potential exploit"
# Escalate to trigger rescan
stella unknowns escalate unk-12345678-... --rescan
# Output:
# Escalated: unk-12345678-...
# Rescan job: rescan-job-001
# Status: queued
# ETA: 5 minutes
```
### 4.3 Bulk Escalation
```bash
# Escalate all KEV items
stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"
# Escalate high-score items
stella unknowns escalate --filter "score>=0.8" --rescan
# Escalate by artifact
stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"
```
### 4.4 Escalation SLA Tracking
```bash
# Check SLA status
stella unknowns sla-status
# Output:
# HOT unknowns SLA (24h):
# In SLA: 10 (83%)
# Breached: 2 (17%)
#
# Breached items:
# unk-111... (26h old) - missing_vex
# unk-222... (30h old) - conflicting_evidence
# Get SLA breach notifications
stella unknowns list --sla-breached
```
---
## 5. Resolution Procedures
### 5.1 Resolution Types
| Resolution | Description | Evidence Required |
|------------|-------------|-------------------|
| `not_affected` | Vulnerability doesn't apply | VEX statement or manual analysis |
| `fixed` | Vulnerability patched | Version upgrade confirmation |
| `mitigated` | Controls in place | Mitigation documentation |
| `false_positive` | Incorrect classification | Analysis report |
| `wont_fix` | Accepted risk | Risk acceptance form |
### 5.2 Resolve Unknown
```bash
# Resolve as not affected
stella unknowns resolve unk-12345678-... \
--resolution not_affected \
--justification "vulnerable_code_not_present" \
--notes "Manual code review confirmed function not used"
# Resolve as fixed
stella unknowns resolve unk-12345678-... \
--resolution fixed \
--justification "version_upgraded" \
--evidence "Upgraded lodash to 4.17.21, CVE patched"
# Resolve as mitigated
stella unknowns resolve unk-12345678-... \
--resolution mitigated \
--justification "inline_mitigations_exist" \
--evidence "WAF rule WAF-001 blocks exploit pattern"
# Resolve as won't fix (risk accepted)
stella unknowns resolve unk-12345678-... \
--resolution wont_fix \
--justification "risk_accepted" \
--evidence "Risk acceptance ticket RISK-123" \
--expires 90d # Re-evaluate in 90 days
```
### 5.3 Bulk Resolution
```bash
# Resolve all items for a fixed package version
stella unknowns resolve-batch \
--filter "purl=pkg:npm/lodash@4.17.20" \
--resolution fixed \
--justification "Upgraded to 4.17.21 fleet-wide" \
--evidence "Fleet upgrade ticket FLEET-456"
# Resolve false positives from analysis
stella unknowns resolve-batch \
--file false-positives.json \
--resolution false_positive
```
### 5.4 Resolution Audit Trail
```bash
# View resolution history
stella unknowns history unk-12345678-...
# Output:
# 2025-12-15 10:00:00 - Created (score: 0.62)
# 2025-12-16 09:30:00 - Triaged by analyst@example.com
# 2025-12-17 14:00:00 - Escalated (KEV added)
# 2025-12-18 11:00:00 - Resolved by security@example.com
# Resolution: not_affected
# Justification: vulnerable_code_not_present
# Notes: Manual code review confirmed function not used
# Export audit trail
stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json
```
---
## 6. Troubleshooting
### 6.1 Score Seems Wrong
**Symptom**: Unknown scored too high or too low.
**Diagnosis**:
```bash
# View score breakdown
stella unknowns show unk-... --score-details
# View proof tree
stella unknowns proof unk-... --verbose
```
**Common causes**:
1. **Stale EPSS data**: EPSS feed not updated
2. **Incorrect blast radius**: Dependency data outdated
3. **Missing containment data**: Seccomp/filesystem status unknown
**Resolution**:
```bash
# Trigger score recalculation
stella unknowns recalculate unk-...
# Force refresh of all input signals
stella unknowns refresh unk-... --force
```
### 6.2 Duplicate Unknowns
**Symptom**: Same issue appears multiple times.
**Diagnosis**:
```bash
# Find potential duplicates
stella unknowns duplicates --scan
# Output shows items with same CVE+PURL but different artifacts
```
**Resolution**:
```bash
# Merge duplicates
stella unknowns merge \
--primary unk-111... \
--secondary unk-222... \
--reason "Same CVE across artifact versions"
```
### 6.3 Escalation Not Working
**Symptom**: Escalation doesn't trigger rescan.
**Diagnosis**:
```bash
# Check escalation status
stella unknowns escalation-status unk-...
# Check Scheduler connectivity
stella health check --service scheduler
# Check job queue
stella scheduler queue status rescan
```
**Resolution**:
```bash
# Retry escalation
stella unknowns escalate unk-... --force
# Manual rescan trigger
stella scan trigger --artifact sha256:abc123... --priority high
```
### 6.4 Resolution Rejected
**Symptom**: Resolution attempt fails validation.
**Diagnosis**:
```bash
# Check resolution requirements
stella unknowns resolution-requirements unk-...
# Output:
# Resolution requirements for unk-12345678-...
# - Justification: required
# - Evidence: required (reason: KEV item)
# - Approver: required (band: HOT)
```
**Resolution**:
```bash
# Provide required evidence
stella unknowns resolve unk-... \
--resolution not_affected \
--justification "vulnerable_code_not_present" \
--evidence "Code review: CRV-123" \
--approver security-lead@example.com
```
---
## 7. Monitoring & Alerting
### 7.1 Key Metrics
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `unknowns_total` | Total unknowns in queue | > 500 |
| `unknowns_hot_count` | HOT band count | > 20 |
| `unknowns_sla_breached` | SLA breaches | > 0 |
| `unknowns_resolution_rate` | Daily resolutions | < 5 |
| `unknowns_escalation_failures` | Failed escalations | > 0 |
| `unknowns_avg_age_hours` | Average unknown age | > 168 (1 week) |
### 7.2 Grafana Dashboard
```
Dashboard: Unknowns Queue Health
Panels:
- Queue size by band (HOT/WARM/COLD)
- SLA compliance rate
- Unknowns by reason code
- Resolution velocity
- Escalation success rate
- Queue age distribution
- KEV item tracking
```
### 7.3 Alerting Rules
```yaml
groups:
- name: unknowns-queue
rules:
- alert: UnknownsHotBandHigh
expr: unknowns_hot_count > 20
for: 5m
labels:
severity: warning
annotations:
summary: "HOT unknowns queue is high ({{ $value }} items)"
- alert: UnknownsSLABreach
expr: unknowns_sla_breached > 0
for: 1m
labels:
severity: critical
annotations:
summary: "{{ $value }} unknowns have breached SLA"
- alert: UnknownsQueueGrowing
expr: rate(unknowns_total[1h]) > 10
for: 30m
labels:
severity: warning
annotations:
summary: "Unknowns queue is growing rapidly"
- alert: UnknownsKEVPending
expr: unknowns_kev_count > 0 and unknowns_kev_unresolved_age_hours > 24
for: 5m
labels:
severity: critical
annotations:
summary: "KEV unknown pending for over 24 hours"
```
### 7.4 Daily Report
```bash
# Generate daily report
stella unknowns report --format email --send-to security-team@example.com
# Report includes:
# - Queue summary (total, by band, by reason)
# - SLA status (in compliance, breaches)
# - Top 10 highest-scored items
# - Newly added items (last 24h)
# - Resolved items (last 24h)
# - KEV item status
# - Trends (7-day, 30-day)
```
---
## Related Documentation
- [Unknowns API Reference](../api/score-proofs-reachability-api-reference.md#5-unknowns-api)
- [Triage Technical Reference](../product-advisories/14-Dec-2025%20-%20Triage%20and%20Unknowns%20Technical%20Reference.md)
- [Score Proofs Runbook](./score-proofs-runbook.md)
- [Policy Engine](../modules/policy/architecture.md)
---
**Last Updated**: 2025-12-20
**Version**: 1.0.0
**Sprint**: 3500.0004.0004