feat: Add operations runbooks and UI API models for Sprint 3500.0004.x

Operations documentation: - docs/operations/reachability-runbook.md - Reachability troubleshooting guide - docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide UI TypeScript models: - src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types - src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types - src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
2025-12-20 22:22:09 +02:00
parent efe9bd8cfe
commit da315965ff
5 changed files with 1719 additions and 0 deletions
--- a/docs/operations/unknowns-queue-runbook.md
+++ b/docs/operations/unknowns-queue-runbook.md
@@ -0,0 +1,590 @@
+# Unknowns Queue Management Runbook
+
+> **Version**: 1.0.0  
+> **Sprint**: 3500.0004.0004  
+> **Last Updated**: 2025-12-20
+
+This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.
+
+---
+
+## Table of Contents
+
+1. [Overview](#1-overview)
+2. [Queue Operations](#2-queue-operations)
+3. [Triage Procedures](#3-triage-procedures)
+4. [Escalation Workflows](#4-escalation-workflows)
+5. [Resolution Procedures](#5-resolution-procedures)
+6. [Troubleshooting](#6-troubleshooting)
+7. [Monitoring & Alerting](#7-monitoring--alerting)
+
+---
+
+## 1. Overview
+
+### What are Unknowns?
+
+Unknowns are items that could not be fully classified during scanning due to:
+
+- Missing VEX statements
+- Ambiguous indirect calls in call graphs
+- Incomplete SBOM data
+- Missing advisory information
+- Conflicting evidence from multiple sources
+
+### Unknown Ranking
+
+Unknowns are ranked using a 2-factor scoring model:
+
+```
+score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
+```
+
+| Factor | Weight | Description |
+|--------|--------|-------------|
+| Blast Radius | 0.60 | Impact scope (dependents, network exposure) |
+| Evidence Scarcity | 0.30 | How much data is missing |
+| Exploit Pressure | 0.30 | EPSS score, KEV status |
+| Containment | -0.20 | Mitigation factors (seccomp, read-only FS) |
+
+### Band Assignment
+
+| Band | Score Range | Priority | SLA |
+|------|-------------|----------|-----|
+| HOT | ≥ 0.70 | Critical | 24 hours |
+| WARM | 0.40 - 0.69 | Normal | 7 days |
+| COLD | < 0.40 | Low | 30 days |
+
+---
+
+## 2. Queue Operations
+
+### 2.1 View Queue Status
+
+```bash
+# Get queue summary
+stella unknowns summary
+
+# Output:
+# Total: 142 unknowns
+# HOT:  12 (8%)  - Requires immediate attention
+# WARM: 85 (60%) - Normal priority
+# COLD: 45 (32%) - Low priority
+# 
+# KEV items: 3
+# Average score: 0.52
+
+# Get queue summary via API
+curl "https://scanner.example.com/api/v1/unknowns/summary" \
+  -H "Authorization: Bearer $TOKEN"
+```
+
+### 2.2 List Unknowns
+
+```bash
+# List all HOT unknowns
+stella unknowns list --band HOT
+
+# List by score (highest first)
+stella unknowns list --sort score --order desc --limit 20
+
+# Filter by reason
+stella unknowns list --reason missing_vex
+
+# Filter by artifact
+stella unknowns list --artifact sha256:abc123...
+
+# Filter by KEV status
+stella unknowns list --kev true
+```
+
+### 2.3 View Unknown Details
+
+```bash
+# Get detailed view
+stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456
+
+# Output:
+# ID: unk-12345678-...
+# Artifact: pkg:oci/myapp@sha256:abc123
+# Reasons: [missing_vex, ambiguous_indirect_call]
+# 
+# Blast Radius:
+#   Dependents: 15 services
+#   Network: internet-facing
+#   Privilege: user
+# 
+# Evidence Scarcity: 0.7 (high)
+# 
+# Exploit Pressure:
+#   EPSS: 0.45
+#   KEV: false
+# 
+# Containment:
+#   Seccomp: enforced (-0.10)
+#   Filesystem: read-only (-0.10)
+# 
+# Score: 0.62 (WARM band)
+# Score Breakdown:
+#   Blast component: +0.35
+#   Scarcity component: +0.21
+#   Pressure component: +0.26
+#   Containment deduction: -0.20
+
+# Show proof tree
+stella unknowns proof unk-12345678-...
+```
+
+### 2.4 Export Queue Data
+
+```bash
+# Export for analysis
+stella unknowns export --format json --output unknowns.json
+
+# Export HOT items for daily review
+stella unknowns export --band HOT --format csv --output hot-unknowns.csv
+
+# Export with full details
+stella unknowns export --verbose --include-proofs --output full-export.json
+```
+
+---
+
+## 3. Triage Procedures
+
+### 3.1 Daily Triage Workflow
+
+**Schedule**: Daily at 9:00 AM
+
+**Duration**: 30 minutes
+
+**Participants**: Security analyst, on-call engineer
+
+**Process**:
+
+```bash
+# 1. Get today's queue snapshot
+stella unknowns snapshot --output daily-$(date +%Y%m%d).json
+
+# 2. Review all HOT items
+stella unknowns list --band HOT --since 24h
+
+# 3. For each HOT unknown, determine action:
+#    - Escalate: Trigger immediate rescan
+#    - Investigate: Needs manual analysis
+#    - Defer: Move to WARM (with justification)
+#    - Resolve: Evidence found, can close
+
+# 4. Process each item
+stella unknowns triage unk-12345678-... --action escalate
+stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
+stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"
+```
+
+### 3.2 Triage Decision Matrix
+
+| Reason Code | KEV | EPSS > 0.5 | Action |
+|-------------|-----|------------|--------|
+| `missing_vex` | Yes | Any | Escalate + Vendor outreach |
+| `missing_vex` | No | Yes | Escalate |
+| `missing_vex` | No | No | Request VEX |
+| `ambiguous_indirect_call` | Any | Any | Manual code review |
+| `incomplete_sbom` | Any | Any | Rescan with updated extractor |
+| `conflicting_evidence` | Any | Any | Manual analysis |
+
+### 3.3 Triage Templates
+
+```bash
+# Quick escalate (HOT + KEV)
+stella unknowns triage unk-... --action escalate \
+  --priority P1 \
+  --notes "KEV item, requires immediate attention"
+
+# Request vendor VEX
+stella unknowns triage unk-... --action investigate \
+  --notes "Requested VEX from vendor via security@vendor.com" \
+  --due-date 7d
+
+# Mark for code review
+stella unknowns triage unk-... --action investigate \
+  --notes "Requires manual code review to resolve indirect call" \
+  --assign @code-review-team
+
+# Defer with justification
+stella unknowns triage unk-... --action defer \
+  --reason "Component not deployed to production" \
+  --evidence "deployment-manifest.yaml shows staging-only"
+```
+
+---
+
+## 4. Escalation Workflows
+
+### 4.1 Automatic Escalation
+
+Unknowns are automatically escalated when:
+
+- Score increases above HOT threshold (0.70)
+- KEV status added to related CVE
+- EPSS score increases significantly (> 0.2 delta)
+- Blast radius increases (new dependents detected)
+
+**Configure auto-escalation**:
+
+```yaml
+# policy.unknowns.escalation.yaml
+autoEscalation:
+  enabled: true
+  triggers:
+    - condition: score >= 0.70
+      action: escalate
+      notify: [security-team]
+    - condition: kev == true
+      action: escalate
+      priority: P1
+      notify: [security-team, management]
+    - condition: epss_delta > 0.2
+      action: escalate
+      notify: [security-team]
+```
+
+### 4.2 Manual Escalation
+
+```bash
+# Escalate via CLI
+stella unknowns escalate unk-12345678-...
+
+# Escalate with reason
+stella unknowns escalate unk-12345678-... \
+  --reason "Customer reported potential exploit"
+
+# Escalate to trigger rescan
+stella unknowns escalate unk-12345678-... --rescan
+
+# Output:
+# Escalated: unk-12345678-...
+# Rescan job: rescan-job-001
+# Status: queued
+# ETA: 5 minutes
+```
+
+### 4.3 Bulk Escalation
+
+```bash
+# Escalate all KEV items
+stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"
+
+# Escalate high-score items
+stella unknowns escalate --filter "score>=0.8" --rescan
+
+# Escalate by artifact
+stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"
+```
+
+### 4.4 Escalation SLA Tracking
+
+```bash
+# Check SLA status
+stella unknowns sla-status
+
+# Output:
+# HOT unknowns SLA (24h):
+#   In SLA: 10 (83%)
+#   Breached: 2 (17%)
+#   
+# Breached items:
+#   unk-111... (26h old) - missing_vex
+#   unk-222... (30h old) - conflicting_evidence
+
+# Get SLA breach notifications
+stella unknowns list --sla-breached
+```
+
+---
+
+## 5. Resolution Procedures
+
+### 5.1 Resolution Types
+
+| Resolution | Description | Evidence Required |
+|------------|-------------|-------------------|
+| `not_affected` | Vulnerability doesn't apply | VEX statement or manual analysis |
+| `fixed` | Vulnerability patched | Version upgrade confirmation |
+| `mitigated` | Controls in place | Mitigation documentation |
+| `false_positive` | Incorrect classification | Analysis report |
+| `wont_fix` | Accepted risk | Risk acceptance form |
+
+### 5.2 Resolve Unknown
+
+```bash
+# Resolve as not affected
+stella unknowns resolve unk-12345678-... \
+  --resolution not_affected \
+  --justification "vulnerable_code_not_present" \
+  --notes "Manual code review confirmed function not used"
+
+# Resolve as fixed
+stella unknowns resolve unk-12345678-... \
+  --resolution fixed \
+  --justification "version_upgraded" \
+  --evidence "Upgraded lodash to 4.17.21, CVE patched"
+
+# Resolve as mitigated
+stella unknowns resolve unk-12345678-... \
+  --resolution mitigated \
+  --justification "inline_mitigations_exist" \
+  --evidence "WAF rule WAF-001 blocks exploit pattern"
+
+# Resolve as won't fix (risk accepted)
+stella unknowns resolve unk-12345678-... \
+  --resolution wont_fix \
+  --justification "risk_accepted" \
+  --evidence "Risk acceptance ticket RISK-123" \
+  --expires 90d  # Re-evaluate in 90 days
+```
+
+### 5.3 Bulk Resolution
+
+```bash
+# Resolve all items for a fixed package version
+stella unknowns resolve-batch \
+  --filter "purl=pkg:npm/lodash@4.17.20" \
+  --resolution fixed \
+  --justification "Upgraded to 4.17.21 fleet-wide" \
+  --evidence "Fleet upgrade ticket FLEET-456"
+
+# Resolve false positives from analysis
+stella unknowns resolve-batch \
+  --file false-positives.json \
+  --resolution false_positive
+```
+
+### 5.4 Resolution Audit Trail
+
+```bash
+# View resolution history
+stella unknowns history unk-12345678-...
+
+# Output:
+# 2025-12-15 10:00:00 - Created (score: 0.62)
+# 2025-12-16 09:30:00 - Triaged by analyst@example.com
+# 2025-12-17 14:00:00 - Escalated (KEV added)
+# 2025-12-18 11:00:00 - Resolved by security@example.com
+#   Resolution: not_affected
+#   Justification: vulnerable_code_not_present
+#   Notes: Manual code review confirmed function not used
+
+# Export audit trail
+stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json
+```
+
+---
+
+## 6. Troubleshooting
+
+### 6.1 Score Seems Wrong
+
+**Symptom**: Unknown scored too high or too low.
+
+**Diagnosis**:
+
+```bash
+# View score breakdown
+stella unknowns show unk-... --score-details
+
+# View proof tree
+stella unknowns proof unk-... --verbose
+```
+
+**Common causes**:
+
+1. **Stale EPSS data**: EPSS feed not updated
+2. **Incorrect blast radius**: Dependency data outdated
+3. **Missing containment data**: Seccomp/filesystem status unknown
+
+**Resolution**:
+
+```bash
+# Trigger score recalculation
+stella unknowns recalculate unk-...
+
+# Force refresh of all input signals
+stella unknowns refresh unk-... --force
+```
+
+### 6.2 Duplicate Unknowns
+
+**Symptom**: Same issue appears multiple times.
+
+**Diagnosis**:
+
+```bash
+# Find potential duplicates
+stella unknowns duplicates --scan
+
+# Output shows items with same CVE+PURL but different artifacts
+```
+
+**Resolution**:
+
+```bash
+# Merge duplicates
+stella unknowns merge \
+  --primary unk-111... \
+  --secondary unk-222... \
+  --reason "Same CVE across artifact versions"
+```
+
+### 6.3 Escalation Not Working
+
+**Symptom**: Escalation doesn't trigger rescan.
+
+**Diagnosis**:
+
+```bash
+# Check escalation status
+stella unknowns escalation-status unk-...
+
+# Check Scheduler connectivity
+stella health check --service scheduler
+
+# Check job queue
+stella scheduler queue status rescan
+```
+
+**Resolution**:
+
+```bash
+# Retry escalation
+stella unknowns escalate unk-... --force
+
+# Manual rescan trigger
+stella scan trigger --artifact sha256:abc123... --priority high
+```
+
+### 6.4 Resolution Rejected
+
+**Symptom**: Resolution attempt fails validation.
+
+**Diagnosis**:
+
+```bash
+# Check resolution requirements
+stella unknowns resolution-requirements unk-...
+
+# Output:
+# Resolution requirements for unk-12345678-...
+# - Justification: required
+# - Evidence: required (reason: KEV item)
+# - Approver: required (band: HOT)
+```
+
+**Resolution**:
+
+```bash
+# Provide required evidence
+stella unknowns resolve unk-... \
+  --resolution not_affected \
+  --justification "vulnerable_code_not_present" \
+  --evidence "Code review: CRV-123" \
+  --approver security-lead@example.com
+```
+
+---
+
+## 7. Monitoring & Alerting
+
+### 7.1 Key Metrics
+
+| Metric | Description | Alert Threshold |
+|--------|-------------|-----------------|
+| `unknowns_total` | Total unknowns in queue | > 500 |
+| `unknowns_hot_count` | HOT band count | > 20 |
+| `unknowns_sla_breached` | SLA breaches | > 0 |
+| `unknowns_resolution_rate` | Daily resolutions | < 5 |
+| `unknowns_escalation_failures` | Failed escalations | > 0 |
+| `unknowns_avg_age_hours` | Average unknown age | > 168 (1 week) |
+
+### 7.2 Grafana Dashboard
+
+```
+Dashboard: Unknowns Queue Health
+Panels:
+- Queue size by band (HOT/WARM/COLD)
+- SLA compliance rate
+- Unknowns by reason code
+- Resolution velocity
+- Escalation success rate
+- Queue age distribution
+- KEV item tracking
+```
+
+### 7.3 Alerting Rules
+
+```yaml
+groups:
+  - name: unknowns-queue
+    rules:
+      - alert: UnknownsHotBandHigh
+        expr: unknowns_hot_count > 20
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "HOT unknowns queue is high ({{ $value }} items)"
+          
+      - alert: UnknownsSLABreach
+        expr: unknowns_sla_breached > 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "{{ $value }} unknowns have breached SLA"
+          
+      - alert: UnknownsQueueGrowing
+        expr: rate(unknowns_total[1h]) > 10
+        for: 30m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Unknowns queue is growing rapidly"
+          
+      - alert: UnknownsKEVPending
+        expr: unknowns_kev_count > 0 and unknowns_kev_unresolved_age_hours > 24
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "KEV unknown pending for over 24 hours"
+```
+
+### 7.4 Daily Report
+
+```bash
+# Generate daily report
+stella unknowns report --format email --send-to security-team@example.com
+
+# Report includes:
+# - Queue summary (total, by band, by reason)
+# - SLA status (in compliance, breaches)
+# - Top 10 highest-scored items
+# - Newly added items (last 24h)
+# - Resolved items (last 24h)
+# - KEV item status
+# - Trends (7-day, 30-day)
+```
+
+---
+
+## Related Documentation
+
+- [Unknowns API Reference](../api/score-proofs-reachability-api-reference.md#5-unknowns-api)
+- [Triage Technical Reference](../product-advisories/14-Dec-2025%20-%20Triage%20and%20Unknowns%20Technical%20Reference.md)
+- [Score Proofs Runbook](./score-proofs-runbook.md)
+- [Policy Engine](../modules/policy/architecture.md)
+
+---
+
+**Last Updated**: 2025-12-20  
+**Version**: 1.0.0  
+**Sprint**: 3500.0004.0004