Files
git.stella-ops.org/docs/operations/unknowns-queue-runbook.md
StellaOps Bot da315965ff feat: Add operations runbooks and UI API models for Sprint 3500.0004.x
Operations documentation:
- docs/operations/reachability-runbook.md - Reachability troubleshooting guide
- docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide

UI TypeScript models:
- src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types
- src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types
- src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types

Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
2025-12-20 22:22:09 +02:00

14 KiB
Raw Blame History

Unknowns Queue Management Runbook

Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20

This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.


Table of Contents

  1. Overview
  2. Queue Operations
  3. Triage Procedures
  4. Escalation Workflows
  5. Resolution Procedures
  6. Troubleshooting
  7. Monitoring & Alerting

1. Overview

What are Unknowns?

Unknowns are items that could not be fully classified during scanning due to:

  • Missing VEX statements
  • Ambiguous indirect calls in call graphs
  • Incomplete SBOM data
  • Missing advisory information
  • Conflicting evidence from multiple sources

Unknown Ranking

Unknowns are ranked using a 2-factor scoring model:

score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
Factor Weight Description
Blast Radius 0.60 Impact scope (dependents, network exposure)
Evidence Scarcity 0.30 How much data is missing
Exploit Pressure 0.30 EPSS score, KEV status
Containment -0.20 Mitigation factors (seccomp, read-only FS)

Band Assignment

Band Score Range Priority SLA
HOT ≥ 0.70 Critical 24 hours
WARM 0.40 - 0.69 Normal 7 days
COLD < 0.40 Low 30 days

2. Queue Operations

2.1 View Queue Status

# Get queue summary
stella unknowns summary

# Output:
# Total: 142 unknowns
# HOT:  12 (8%)  - Requires immediate attention
# WARM: 85 (60%) - Normal priority
# COLD: 45 (32%) - Low priority
# 
# KEV items: 3
# Average score: 0.52

# Get queue summary via API
curl "https://scanner.example.com/api/v1/unknowns/summary" \
  -H "Authorization: Bearer $TOKEN"

2.2 List Unknowns

# List all HOT unknowns
stella unknowns list --band HOT

# List by score (highest first)
stella unknowns list --sort score --order desc --limit 20

# Filter by reason
stella unknowns list --reason missing_vex

# Filter by artifact
stella unknowns list --artifact sha256:abc123...

# Filter by KEV status
stella unknowns list --kev true

2.3 View Unknown Details

# Get detailed view
stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456

# Output:
# ID: unk-12345678-...
# Artifact: pkg:oci/myapp@sha256:abc123
# Reasons: [missing_vex, ambiguous_indirect_call]
# 
# Blast Radius:
#   Dependents: 15 services
#   Network: internet-facing
#   Privilege: user
# 
# Evidence Scarcity: 0.7 (high)
# 
# Exploit Pressure:
#   EPSS: 0.45
#   KEV: false
# 
# Containment:
#   Seccomp: enforced (-0.10)
#   Filesystem: read-only (-0.10)
# 
# Score: 0.62 (WARM band)
# Score Breakdown:
#   Blast component: +0.35
#   Scarcity component: +0.21
#   Pressure component: +0.26
#   Containment deduction: -0.20

# Show proof tree
stella unknowns proof unk-12345678-...

2.4 Export Queue Data

# Export for analysis
stella unknowns export --format json --output unknowns.json

# Export HOT items for daily review
stella unknowns export --band HOT --format csv --output hot-unknowns.csv

# Export with full details
stella unknowns export --verbose --include-proofs --output full-export.json

3. Triage Procedures

3.1 Daily Triage Workflow

Schedule: Daily at 9:00 AM

Duration: 30 minutes

Participants: Security analyst, on-call engineer

Process:

# 1. Get today's queue snapshot
stella unknowns snapshot --output daily-$(date +%Y%m%d).json

# 2. Review all HOT items
stella unknowns list --band HOT --since 24h

# 3. For each HOT unknown, determine action:
#    - Escalate: Trigger immediate rescan
#    - Investigate: Needs manual analysis
#    - Defer: Move to WARM (with justification)
#    - Resolve: Evidence found, can close

# 4. Process each item
stella unknowns triage unk-12345678-... --action escalate
stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"

3.2 Triage Decision Matrix

Reason Code KEV EPSS > 0.5 Action
missing_vex Yes Any Escalate + Vendor outreach
missing_vex No Yes Escalate
missing_vex No No Request VEX
ambiguous_indirect_call Any Any Manual code review
incomplete_sbom Any Any Rescan with updated extractor
conflicting_evidence Any Any Manual analysis

3.3 Triage Templates

# Quick escalate (HOT + KEV)
stella unknowns triage unk-... --action escalate \
  --priority P1 \
  --notes "KEV item, requires immediate attention"

# Request vendor VEX
stella unknowns triage unk-... --action investigate \
  --notes "Requested VEX from vendor via security@vendor.com" \
  --due-date 7d

# Mark for code review
stella unknowns triage unk-... --action investigate \
  --notes "Requires manual code review to resolve indirect call" \
  --assign @code-review-team

# Defer with justification
stella unknowns triage unk-... --action defer \
  --reason "Component not deployed to production" \
  --evidence "deployment-manifest.yaml shows staging-only"

4. Escalation Workflows

4.1 Automatic Escalation

Unknowns are automatically escalated when:

  • Score increases above HOT threshold (0.70)
  • KEV status added to related CVE
  • EPSS score increases significantly (> 0.2 delta)
  • Blast radius increases (new dependents detected)

Configure auto-escalation:

# policy.unknowns.escalation.yaml
autoEscalation:
  enabled: true
  triggers:
    - condition: score >= 0.70
      action: escalate
      notify: [security-team]
    - condition: kev == true
      action: escalate
      priority: P1
      notify: [security-team, management]
    - condition: epss_delta > 0.2
      action: escalate
      notify: [security-team]

4.2 Manual Escalation

# Escalate via CLI
stella unknowns escalate unk-12345678-...

# Escalate with reason
stella unknowns escalate unk-12345678-... \
  --reason "Customer reported potential exploit"

# Escalate to trigger rescan
stella unknowns escalate unk-12345678-... --rescan

# Output:
# Escalated: unk-12345678-...
# Rescan job: rescan-job-001
# Status: queued
# ETA: 5 minutes

4.3 Bulk Escalation

# Escalate all KEV items
stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"

# Escalate high-score items
stella unknowns escalate --filter "score>=0.8" --rescan

# Escalate by artifact
stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"

4.4 Escalation SLA Tracking

# Check SLA status
stella unknowns sla-status

# Output:
# HOT unknowns SLA (24h):
#   In SLA: 10 (83%)
#   Breached: 2 (17%)
#   
# Breached items:
#   unk-111... (26h old) - missing_vex
#   unk-222... (30h old) - conflicting_evidence

# Get SLA breach notifications
stella unknowns list --sla-breached

5. Resolution Procedures

5.1 Resolution Types

Resolution Description Evidence Required
not_affected Vulnerability doesn't apply VEX statement or manual analysis
fixed Vulnerability patched Version upgrade confirmation
mitigated Controls in place Mitigation documentation
false_positive Incorrect classification Analysis report
wont_fix Accepted risk Risk acceptance form

5.2 Resolve Unknown

# Resolve as not affected
stella unknowns resolve unk-12345678-... \
  --resolution not_affected \
  --justification "vulnerable_code_not_present" \
  --notes "Manual code review confirmed function not used"

# Resolve as fixed
stella unknowns resolve unk-12345678-... \
  --resolution fixed \
  --justification "version_upgraded" \
  --evidence "Upgraded lodash to 4.17.21, CVE patched"

# Resolve as mitigated
stella unknowns resolve unk-12345678-... \
  --resolution mitigated \
  --justification "inline_mitigations_exist" \
  --evidence "WAF rule WAF-001 blocks exploit pattern"

# Resolve as won't fix (risk accepted)
stella unknowns resolve unk-12345678-... \
  --resolution wont_fix \
  --justification "risk_accepted" \
  --evidence "Risk acceptance ticket RISK-123" \
  --expires 90d  # Re-evaluate in 90 days

5.3 Bulk Resolution

# Resolve all items for a fixed package version
stella unknowns resolve-batch \
  --filter "purl=pkg:npm/lodash@4.17.20" \
  --resolution fixed \
  --justification "Upgraded to 4.17.21 fleet-wide" \
  --evidence "Fleet upgrade ticket FLEET-456"

# Resolve false positives from analysis
stella unknowns resolve-batch \
  --file false-positives.json \
  --resolution false_positive

5.4 Resolution Audit Trail

# View resolution history
stella unknowns history unk-12345678-...

# Output:
# 2025-12-15 10:00:00 - Created (score: 0.62)
# 2025-12-16 09:30:00 - Triaged by analyst@example.com
# 2025-12-17 14:00:00 - Escalated (KEV added)
# 2025-12-18 11:00:00 - Resolved by security@example.com
#   Resolution: not_affected
#   Justification: vulnerable_code_not_present
#   Notes: Manual code review confirmed function not used

# Export audit trail
stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json

6. Troubleshooting

6.1 Score Seems Wrong

Symptom: Unknown scored too high or too low.

Diagnosis:

# View score breakdown
stella unknowns show unk-... --score-details

# View proof tree
stella unknowns proof unk-... --verbose

Common causes:

  1. Stale EPSS data: EPSS feed not updated
  2. Incorrect blast radius: Dependency data outdated
  3. Missing containment data: Seccomp/filesystem status unknown

Resolution:

# Trigger score recalculation
stella unknowns recalculate unk-...

# Force refresh of all input signals
stella unknowns refresh unk-... --force

6.2 Duplicate Unknowns

Symptom: Same issue appears multiple times.

Diagnosis:

# Find potential duplicates
stella unknowns duplicates --scan

# Output shows items with same CVE+PURL but different artifacts

Resolution:

# Merge duplicates
stella unknowns merge \
  --primary unk-111... \
  --secondary unk-222... \
  --reason "Same CVE across artifact versions"

6.3 Escalation Not Working

Symptom: Escalation doesn't trigger rescan.

Diagnosis:

# Check escalation status
stella unknowns escalation-status unk-...

# Check Scheduler connectivity
stella health check --service scheduler

# Check job queue
stella scheduler queue status rescan

Resolution:

# Retry escalation
stella unknowns escalate unk-... --force

# Manual rescan trigger
stella scan trigger --artifact sha256:abc123... --priority high

6.4 Resolution Rejected

Symptom: Resolution attempt fails validation.

Diagnosis:

# Check resolution requirements
stella unknowns resolution-requirements unk-...

# Output:
# Resolution requirements for unk-12345678-...
# - Justification: required
# - Evidence: required (reason: KEV item)
# - Approver: required (band: HOT)

Resolution:

# Provide required evidence
stella unknowns resolve unk-... \
  --resolution not_affected \
  --justification "vulnerable_code_not_present" \
  --evidence "Code review: CRV-123" \
  --approver security-lead@example.com

7. Monitoring & Alerting

7.1 Key Metrics

Metric Description Alert Threshold
unknowns_total Total unknowns in queue > 500
unknowns_hot_count HOT band count > 20
unknowns_sla_breached SLA breaches > 0
unknowns_resolution_rate Daily resolutions < 5
unknowns_escalation_failures Failed escalations > 0
unknowns_avg_age_hours Average unknown age > 168 (1 week)

7.2 Grafana Dashboard

Dashboard: Unknowns Queue Health
Panels:
- Queue size by band (HOT/WARM/COLD)
- SLA compliance rate
- Unknowns by reason code
- Resolution velocity
- Escalation success rate
- Queue age distribution
- KEV item tracking

7.3 Alerting Rules

groups:
  - name: unknowns-queue
    rules:
      - alert: UnknownsHotBandHigh
        expr: unknowns_hot_count > 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HOT unknowns queue is high ({{ $value }} items)"
          
      - alert: UnknownsSLABreach
        expr: unknowns_sla_breached > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "{{ $value }} unknowns have breached SLA"
          
      - alert: UnknownsQueueGrowing
        expr: rate(unknowns_total[1h]) > 10
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Unknowns queue is growing rapidly"
          
      - alert: UnknownsKEVPending
        expr: unknowns_kev_count > 0 and unknowns_kev_unresolved_age_hours > 24
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "KEV unknown pending for over 24 hours"

7.4 Daily Report

# Generate daily report
stella unknowns report --format email --send-to security-team@example.com

# Report includes:
# - Queue summary (total, by band, by reason)
# - SLA status (in compliance, breaches)
# - Top 10 highest-scored items
# - Newly added items (last 24h)
# - Resolved items (last 24h)
# - KEV item status
# - Trends (7-day, 30-day)


Last Updated: 2025-12-20
Version: 1.0.0
Sprint: 3500.0004.0004