Files
git.stella-ops.org/docs/operations/unknowns-queue-runbook.md

19 KiB
Raw Blame History

Unknowns Queue Management Runbook

Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20

This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.


Table of Contents

  1. Overview
  2. Queue Operations
  3. Triage Procedures
  4. Escalation Workflows
  5. Resolution Procedures
  6. Troubleshooting
  7. Monitoring & Alerting

1. Overview

What are Unknowns?

Unknowns are items that could not be fully classified during scanning due to:

  • Missing VEX statements
  • Ambiguous indirect calls in call graphs
  • Incomplete SBOM data
  • Missing advisory information
  • Conflicting evidence from multiple sources

Unknown Ranking

Unknowns are ranked using a 2-factor scoring model:

score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
Factor Weight Description
Blast Radius 0.60 Impact scope (dependents, network exposure)
Evidence Scarcity 0.30 How much data is missing
Exploit Pressure 0.30 EPSS score, KEV status
Containment -0.20 Mitigation factors (seccomp, read-only FS)

Band Assignment

Band Score Range Priority SLA
HOT ≥ 0.70 Critical 24 hours
WARM 0.40 - 0.69 Normal 7 days
COLD < 0.40 Low 30 days

2. Queue Operations

2.1 View Queue Status

# Get queue summary
stella unknowns summary

# Output:
# Total: 142 unknowns
# HOT:  12 (8%)  - Requires immediate attention
# WARM: 85 (60%) - Normal priority
# COLD: 45 (32%) - Low priority
# 
# KEV items: 3
# Average score: 0.52

# Get queue summary via API
curl "https://scanner.example.com/api/v1/unknowns/summary" \
  -H "Authorization: Bearer $TOKEN"

2.2 List Unknowns

# List all HOT unknowns
stella unknowns list --band HOT

# List by score (highest first)
stella unknowns list --sort score --order desc --limit 20

# Filter by reason
stella unknowns list --reason missing_vex

# Filter by artifact
stella unknowns list --artifact sha256:abc123...

# Filter by KEV status
stella unknowns list --kev true

2.3 View Unknown Details

# Get detailed view
stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456

# Output:
# ID: unk-12345678-...
# Artifact: pkg:oci/myapp@sha256:abc123
# Reasons: [missing_vex, ambiguous_indirect_call]
# 
# Blast Radius:
#   Dependents: 15 services
#   Network: internet-facing
#   Privilege: user
# 
# Evidence Scarcity: 0.7 (high)
# 
# Exploit Pressure:
#   EPSS: 0.45
#   KEV: false
# 
# Containment:
#   Seccomp: enforced (-0.10)
#   Filesystem: read-only (-0.10)
# 
# Score: 0.62 (WARM band)
# Score Breakdown:
#   Blast component: +0.35
#   Scarcity component: +0.21
#   Pressure component: +0.26
#   Containment deduction: -0.20

# Show proof tree
stella unknowns proof unk-12345678-...

2.4 Export Queue Data

# Export for analysis
stella unknowns export --format json --output unknowns.json

# Export HOT items for daily review
stella unknowns export --band HOT --format csv --output hot-unknowns.csv

# Export with full details
stella unknowns export --verbose --include-proofs --output full-export.json

3. Triage Procedures

3.1 Daily Triage Workflow

Schedule: Daily at 9:00 AM

Duration: 30 minutes

Participants: Security analyst, on-call engineer

Process:

# 1. Get today's queue snapshot
stella unknowns snapshot --output daily-$(date +%Y%m%d).json

# 2. Review all HOT items
stella unknowns list --band HOT --since 24h

# 3. For each HOT unknown, determine action:
#    - Escalate: Trigger immediate rescan
#    - Investigate: Needs manual analysis
#    - Defer: Move to WARM (with justification)
#    - Resolve: Evidence found, can close

# 4. Process each item
stella unknowns triage unk-12345678-... --action escalate
stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"

3.2 Triage Decision Matrix

Reason Code KEV EPSS > 0.5 Action
missing_vex Yes Any Escalate + Vendor outreach
missing_vex No Yes Escalate
missing_vex No No Request VEX
ambiguous_indirect_call Any Any Manual code review
incomplete_sbom Any Any Rescan with updated extractor
conflicting_evidence Any Any Manual analysis

3.3 Triage Templates

# Quick escalate (HOT + KEV)
stella unknowns triage unk-... --action escalate \
  --priority P1 \
  --notes "KEV item, requires immediate attention"

# Request vendor VEX
stella unknowns triage unk-... --action investigate \
  --notes "Requested VEX from vendor via security@vendor.com" \
  --due-date 7d

# Mark for code review
stella unknowns triage unk-... --action investigate \
  --notes "Requires manual code review to resolve indirect call" \
  --assign @code-review-team

# Defer with justification
stella unknowns triage unk-... --action defer \
  --reason "Component not deployed to production" \
  --evidence "deployment-manifest.yaml shows staging-only"

4. Escalation Workflows

4.1 Automatic Escalation

Unknowns are automatically escalated when:

  • Score increases above HOT threshold (0.70)
  • KEV status added to related CVE
  • EPSS score increases significantly (> 0.2 delta)
  • Blast radius increases (new dependents detected)

Configure auto-escalation:

# policy.unknowns.escalation.yaml
autoEscalation:
  enabled: true
  triggers:
    - condition: score >= 0.70
      action: escalate
      notify: [security-team]
    - condition: kev == true
      action: escalate
      priority: P1
      notify: [security-team, management]
    - condition: epss_delta > 0.2
      action: escalate
      notify: [security-team]

4.2 Manual Escalation

# Escalate via CLI
stella unknowns escalate unk-12345678-...

# Escalate with reason
stella unknowns escalate unk-12345678-... \
  --reason "Customer reported potential exploit"

# Escalate to trigger rescan
stella unknowns escalate unk-12345678-... --rescan

# Output:
# Escalated: unk-12345678-...
# Rescan job: rescan-job-001
# Status: queued
# ETA: 5 minutes

4.3 Bulk Escalation

# Escalate all KEV items
stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"

# Escalate high-score items
stella unknowns escalate --filter "score>=0.8" --rescan

# Escalate by artifact
stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"

4.4 Escalation SLA Tracking

# Check SLA status
stella unknowns sla-status

# Output:
# HOT unknowns SLA (24h):
#   In SLA: 10 (83%)
#   Breached: 2 (17%)
#   
# Breached items:
#   unk-111... (26h old) - missing_vex
#   unk-222... (30h old) - conflicting_evidence

# Get SLA breach notifications
stella unknowns list --sla-breached

5. Resolution Procedures

5.1 Resolution Types

Resolution Description Evidence Required
not_affected Vulnerability doesn't apply VEX statement or manual analysis
fixed Vulnerability patched Version upgrade confirmation
mitigated Controls in place Mitigation documentation
false_positive Incorrect classification Analysis report
wont_fix Accepted risk Risk acceptance form

5.2 Resolve Unknown

# Resolve as not affected
stella unknowns resolve unk-12345678-... \
  --resolution not_affected \
  --justification "vulnerable_code_not_present" \
  --notes "Manual code review confirmed function not used"

# Resolve as fixed
stella unknowns resolve unk-12345678-... \
  --resolution fixed \
  --justification "version_upgraded" \
  --evidence "Upgraded lodash to 4.17.21, CVE patched"

# Resolve as mitigated
stella unknowns resolve unk-12345678-... \
  --resolution mitigated \
  --justification "inline_mitigations_exist" \
  --evidence "WAF rule WAF-001 blocks exploit pattern"

# Resolve as won't fix (risk accepted)
stella unknowns resolve unk-12345678-... \
  --resolution wont_fix \
  --justification "risk_accepted" \
  --evidence "Risk acceptance ticket RISK-123" \
  --expires 90d  # Re-evaluate in 90 days

5.3 Bulk Resolution

# Resolve all items for a fixed package version
stella unknowns resolve-batch \
  --filter "purl=pkg:npm/lodash@4.17.20" \
  --resolution fixed \
  --justification "Upgraded to 4.17.21 fleet-wide" \
  --evidence "Fleet upgrade ticket FLEET-456"

# Resolve false positives from analysis
stella unknowns resolve-batch \
  --file false-positives.json \
  --resolution false_positive

5.4 Resolution Audit Trail

# View resolution history
stella unknowns history unk-12345678-...

# Output:
# 2025-12-15 10:00:00 - Created (score: 0.62)
# 2025-12-16 09:30:00 - Triaged by analyst@example.com
# 2025-12-17 14:00:00 - Escalated (KEV added)
# 2025-12-18 11:00:00 - Resolved by security@example.com
#   Resolution: not_affected
#   Justification: vulnerable_code_not_present
#   Notes: Manual code review confirmed function not used

# Export audit trail
stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json

6. Troubleshooting

6.1 Score Seems Wrong

Symptom: Unknown scored too high or too low.

Diagnosis:

# View score breakdown
stella unknowns show unk-... --score-details

# View proof tree
stella unknowns proof unk-... --verbose

Common causes:

  1. Stale EPSS data: EPSS feed not updated
  2. Incorrect blast radius: Dependency data outdated
  3. Missing containment data: Seccomp/filesystem status unknown

Resolution:

# Trigger score recalculation
stella unknowns recalculate unk-...

# Force refresh of all input signals
stella unknowns refresh unk-... --force

6.2 Duplicate Unknowns

Symptom: Same issue appears multiple times.

Diagnosis:

# Find potential duplicates
stella unknowns duplicates --scan

# Output shows items with same CVE+PURL but different artifacts

Resolution:

# Merge duplicates
stella unknowns merge \
  --primary unk-111... \
  --secondary unk-222... \
  --reason "Same CVE across artifact versions"

6.3 Escalation Not Working

Symptom: Escalation doesn't trigger rescan.

Diagnosis:

# Check escalation status
stella unknowns escalation-status unk-...

# Check Scheduler connectivity
stella health check --service scheduler

# Check job queue
stella scheduler queue status rescan

Resolution:

# Retry escalation
stella unknowns escalate unk-... --force

# Manual rescan trigger
stella scan trigger --artifact sha256:abc123... --priority high

6.4 Resolution Rejected

Symptom: Resolution attempt fails validation.

Diagnosis:

# Check resolution requirements
stella unknowns resolution-requirements unk-...

# Output:
# Resolution requirements for unk-12345678-...
# - Justification: required
# - Evidence: required (reason: KEV item)
# - Approver: required (band: HOT)

Resolution:

# Provide required evidence
stella unknowns resolve unk-... \
  --resolution not_affected \
  --justification "vulnerable_code_not_present" \
  --evidence "Code review: CRV-123" \
  --approver security-lead@example.com

7. Monitoring & Alerting

7.1 Key Metrics

Metric Description Alert Threshold
unknowns_total Total unknowns in queue > 500
unknowns_hot_count HOT band count > 20
unknowns_sla_breached SLA breaches > 0
unknowns_resolution_rate Daily resolutions < 5
unknowns_escalation_failures Failed escalations > 0
unknowns_avg_age_hours Average unknown age > 168 (1 week)

7.2 Grafana Dashboard

Dashboard: Unknowns Queue Health
Panels:
- Queue size by band (HOT/WARM/COLD)
- SLA compliance rate
- Unknowns by reason code
- Resolution velocity
- Escalation success rate
- Queue age distribution
- KEV item tracking

7.3 Alerting Rules

groups:
  - name: unknowns-queue
    rules:
      - alert: UnknownsHotBandHigh
        expr: unknowns_hot_count > 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HOT unknowns queue is high ({{ $value }} items)"
          
      - alert: UnknownsSLABreach
        expr: unknowns_sla_breached > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "{{ $value }} unknowns have breached SLA"
          
      - alert: UnknownsQueueGrowing
        expr: rate(unknowns_total[1h]) > 10
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Unknowns queue is growing rapidly"
          
      - alert: UnknownsKEVPending
        expr: unknowns_kev_count > 0 and unknowns_kev_unresolved_age_hours > 24
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "KEV unknown pending for over 24 hours"

7.4 Daily Report

# Generate daily report
stella unknowns report --format email --send-to security-team@example.com

# Report includes:
# - Queue summary (total, by band, by reason)
# - SLA status (in compliance, breaches)
# - Top 10 highest-scored items
# - Newly added items (last 24h)
# - Resolved items (last 24h)
# - KEV item status
# - Trends (7-day, 30-day)

8. Unknown Budgets

Unknown budgets enforce per-environment caps on unknowns by reason code. Budgets can warn or block when exceeded.

Configuration:

# etc/policy.unknowns.budgets.yaml
unknownBudgets:
  enforceBudgets: true
  budgets:
    prod:
      environment: prod
      totalLimit: 3
      reasonLimits:
        Reachability: 0
        Provenance: 0
        VexConflict: 1
      action: Block
      exceededMessage: "Production requires zero reachability unknowns"

    stage:
      environment: stage
      totalLimit: 10
      reasonLimits:
        Reachability: 1
      action: WarnUnlessException

    dev:
      environment: dev
      totalLimit: null
      action: Warn

    default:
      environment: default
      totalLimit: 5
      action: Warn

Exception coverage:

To allow approved exceptions to cover specific unknown reason codes, set exception metadata unknown_reason_codes (comma-separated). Example: Reachability, U-VEX.



8. Grey Queue Operations

Sprint: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli

The Grey Queue handles observations with uncertain status requiring operator attention or additional evidence. These are distinct from standard HOT/WARM/COLD band unknowns.

8.1 Grey Queue Overview

Grey Queue items have:

  • Observation state: PendingDeterminization, Disputed, or GuardedPass
  • Reanalysis fingerprint: Deterministic ID for reproducible replays
  • Triggers: Events that caused reanalysis
  • Conflicts: Detected evidence disagreements
  • Next actions: Suggested resolution paths

8.2 List Grey Queue Items

# List all grey queue items
stella unknowns list --state grey

# List by observation state
stella unknowns list --observation-state pending-determinization
stella unknowns list --observation-state disputed
stella unknowns list --observation-state guarded-pass

# List with fingerprint details
stella unknowns list --state grey --show-fingerprint

# List with conflict summary
stella unknowns list --state grey --show-conflicts

8.3 View Grey Queue Details

# Show grey queue item with full details
stella unknowns show unk-12345678-... --grey

# Output:
# ID: unk-12345678-...
# Observation State: Disputed
# 
# Reanalysis Fingerprint:
#   ID: sha256:abc123...
#   Computed At: 2026-01-15T10:00:00Z
#   Policy Config Hash: sha256:def456...
# 
# Triggers (2):
#   - epss.updated@1 (2026-01-15T09:55:00Z) delta=0.15
#   - vex.updated@1 (2026-01-15T09:50:00Z)
# 
# Conflicts (1):
#   - VexStatusConflict: vendor-a reports 'not_affected', vendor-b reports 'affected'
#     Severity: high
#     Adjudication: manual_review
# 
# Next Actions:
#   - trust_resolution: Resolve issuer trust conflict
#   - manual_review: Escalate to security team

# Show fingerprint only
stella unknowns fingerprint unk-12345678-...

# Show triggers only
stella unknowns triggers unk-12345678-...

8.4 Grey Queue Triage Actions

# Resolve a grey queue item (operator determination)
stella unknowns resolve unk-12345678-... \
  --status not_affected \
  --justification "Verified vendor VEX is authoritative" \
  --evidence-ref "vex-observation-id-123"

# Escalate for manual review
stella unknowns escalate unk-12345678-... \
  --priority P1 \
  --reason "Conflicting VEX requires security team decision"

# Defer pending additional evidence
stella unknowns defer unk-12345678-... \
  --await vex \
  --reason "Waiting for upstream vendor VEX statement"

8.5 Grey Queue Conflict Resolution

# List items with conflicts
stella unknowns list --has-conflicts

# Filter by conflict type
stella unknowns list --conflict-type vex-status-conflict
stella unknowns list --conflict-type vex-reachability-contradiction
stella unknowns list --conflict-type trust-tie

# Resolve a conflict manually
stella unknowns resolve-conflict unk-12345678-... \
  --winner vendor-a \
  --reason "vendor-a is the upstream maintainer"

8.6 Grey Queue Summary

# Get grey queue summary
stella unknowns summary --grey

# Output:
# Grey Queue: 23 items
# 
# By State:
#   PendingDeterminization: 15 (65%)
#   Disputed: 5 (22%)
#   GuardedPass: 3 (13%)
# 
# Conflicts: 8 items have conflicts
# Avg. Triggers: 2.3 per item
# Oldest: 7 days

8.7 Grey Queue Export

# Export grey queue for analysis
stella unknowns export --state grey --format json --output grey-queue.json

# Export with full fingerprints and triggers
stella unknowns export --state grey --verbose --output grey-full.json

# Export conflicts only
stella unknowns export --has-conflicts --format csv --output conflicts.csv

Last Updated: 2026-01-16
Version: 1.1.0
Sprint: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli