22 KiB
Unknowns Queue Management Runbook
Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20
This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.
Table of Contents
- Overview
- Queue Operations
- Triage Procedures
- Escalation Workflows
- Resolution Procedures
- Troubleshooting
- Monitoring & Alerting
1. Overview
What are Unknowns?
Unknowns are items that could not be fully classified during scanning due to:
- Missing VEX statements
- Ambiguous indirect calls in call graphs
- Incomplete SBOM data
- Missing advisory information
- Conflicting evidence from multiple sources
Unknown Ranking
Unknowns are ranked using a 2-factor scoring model:
score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
| Factor | Weight | Description |
|---|---|---|
| Blast Radius | 0.60 | Impact scope (dependents, network exposure) |
| Evidence Scarcity | 0.30 | How much data is missing |
| Exploit Pressure | 0.30 | EPSS score, KEV status |
| Containment | -0.20 | Mitigation factors (seccomp, read-only FS) |
Band Assignment
| Band | Score Range | Priority | SLA |
|---|---|---|---|
| HOT | ≥ 0.70 | Critical | 24 hours |
| WARM | 0.40 - 0.69 | Normal | 7 days |
| COLD | < 0.40 | Low | 30 days |
2. Queue Operations
2.1 View Queue Status
# Get queue summary
stella unknowns summary
# Output:
# Total: 142 unknowns
# HOT: 12 (8%) - Requires immediate attention
# WARM: 85 (60%) - Normal priority
# COLD: 45 (32%) - Low priority
#
# KEV items: 3
# Average score: 0.52
# Get queue summary via API
curl "https://scanner.example.com/api/v1/unknowns/summary" \
-H "Authorization: Bearer $TOKEN"
2.2 List Unknowns
# List all HOT unknowns
stella unknowns list --band HOT
# List by score (highest first)
stella unknowns list --sort score --order desc --limit 20
# Filter by reason
stella unknowns list --reason missing_vex
# Filter by artifact
stella unknowns list --artifact sha256:abc123...
# Filter by KEV status
stella unknowns list --kev true
2.3 View Unknown Details
# Get detailed view
stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456
# Output:
# ID: unk-12345678-...
# Artifact: pkg:oci/myapp@sha256:abc123
# Reasons: [missing_vex, ambiguous_indirect_call]
#
# Blast Radius:
# Dependents: 15 services
# Network: internet-facing
# Privilege: user
#
# Evidence Scarcity: 0.7 (high)
#
# Exploit Pressure:
# EPSS: 0.45
# KEV: false
#
# Containment:
# Seccomp: enforced (-0.10)
# Filesystem: read-only (-0.10)
#
# Score: 0.62 (WARM band)
# Score Breakdown:
# Blast component: +0.35
# Scarcity component: +0.21
# Pressure component: +0.26
# Containment deduction: -0.20
# Show proof tree
stella unknowns proof unk-12345678-...
2.4 Export Queue Data
# Export for analysis
stella unknowns export --format json --output unknowns.json
# Export HOT items for daily review
stella unknowns export --band HOT --format csv --output hot-unknowns.csv
# Export with full details
stella unknowns export --verbose --include-proofs --output full-export.json
3. Triage Procedures
3.1 Daily Triage Workflow
Schedule: Daily at 9:00 AM
Duration: 30 minutes
Participants: Security analyst, on-call engineer
Process:
# 1. Get today's queue snapshot
stella unknowns snapshot --output daily-$(date +%Y%m%d).json
# 2. Review all HOT items
stella unknowns list --band HOT --since 24h
# 3. For each HOT unknown, determine action:
# - Escalate: Trigger immediate rescan
# - Investigate: Needs manual analysis
# - Defer: Move to WARM (with justification)
# - Resolve: Evidence found, can close
# 4. Process each item
stella unknowns triage unk-12345678-... --action escalate
stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"
3.2 Triage Decision Matrix
| Reason Code | KEV | EPSS > 0.5 | Action |
|---|---|---|---|
missing_vex |
Yes | Any | Escalate + Vendor outreach |
missing_vex |
No | Yes | Escalate |
missing_vex |
No | No | Request VEX |
ambiguous_indirect_call |
Any | Any | Manual code review |
incomplete_sbom |
Any | Any | Rescan with updated extractor |
conflicting_evidence |
Any | Any | Manual analysis |
3.3 Triage Templates
# Quick escalate (HOT + KEV)
stella unknowns triage unk-... --action escalate \
--priority P1 \
--notes "KEV item, requires immediate attention"
# Request vendor VEX
stella unknowns triage unk-... --action investigate \
--notes "Requested VEX from vendor via security@vendor.com" \
--due-date 7d
# Mark for code review
stella unknowns triage unk-... --action investigate \
--notes "Requires manual code review to resolve indirect call" \
--assign @code-review-team
# Defer with justification
stella unknowns triage unk-... --action defer \
--reason "Component not deployed to production" \
--evidence "deployment-manifest.yaml shows staging-only"
4. Escalation Workflows
4.1 Automatic Escalation
Unknowns are automatically escalated when:
- Score increases above HOT threshold (0.70)
- KEV status added to related CVE
- EPSS score increases significantly (> 0.2 delta)
- Blast radius increases (new dependents detected)
Configure auto-escalation:
# policy.unknowns.escalation.yaml
autoEscalation:
enabled: true
triggers:
- condition: score >= 0.70
action: escalate
notify: [security-team]
- condition: kev == true
action: escalate
priority: P1
notify: [security-team, management]
- condition: epss_delta > 0.2
action: escalate
notify: [security-team]
4.2 Manual Escalation
# Escalate via CLI
stella unknowns escalate unk-12345678-...
# Escalate with reason
stella unknowns escalate unk-12345678-... \
--reason "Customer reported potential exploit"
# Escalate to trigger rescan
stella unknowns escalate unk-12345678-... --rescan
# Output:
# Escalated: unk-12345678-...
# Rescan job: rescan-job-001
# Status: queued
# ETA: 5 minutes
4.3 Bulk Escalation
# Escalate all KEV items
stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"
# Escalate high-score items
stella unknowns escalate --filter "score>=0.8" --rescan
# Escalate by artifact
stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"
4.4 Escalation SLA Tracking
# Check SLA status
stella unknowns sla-status
# Output:
# HOT unknowns SLA (24h):
# In SLA: 10 (83%)
# Breached: 2 (17%)
#
# Breached items:
# unk-111... (26h old) - missing_vex
# unk-222... (30h old) - conflicting_evidence
# Get SLA breach notifications
stella unknowns list --sla-breached
5. Resolution Procedures
5.1 Resolution Types
| Resolution | Description | Evidence Required |
|---|---|---|
not_affected |
Vulnerability doesn't apply | VEX statement or manual analysis |
fixed |
Vulnerability patched | Version upgrade confirmation |
mitigated |
Controls in place | Mitigation documentation |
false_positive |
Incorrect classification | Analysis report |
wont_fix |
Accepted risk | Risk acceptance form |
5.2 Resolve Unknown
# Resolve as not affected
stella unknowns resolve unk-12345678-... \
--resolution not_affected \
--justification "vulnerable_code_not_present" \
--notes "Manual code review confirmed function not used"
# Resolve as fixed
stella unknowns resolve unk-12345678-... \
--resolution fixed \
--justification "version_upgraded" \
--evidence "Upgraded lodash to 4.17.21, CVE patched"
# Resolve as mitigated
stella unknowns resolve unk-12345678-... \
--resolution mitigated \
--justification "inline_mitigations_exist" \
--evidence "WAF rule WAF-001 blocks exploit pattern"
# Resolve as won't fix (risk accepted)
stella unknowns resolve unk-12345678-... \
--resolution wont_fix \
--justification "risk_accepted" \
--evidence "Risk acceptance ticket RISK-123" \
--expires 90d # Re-evaluate in 90 days
5.3 Bulk Resolution
# Resolve all items for a fixed package version
stella unknowns resolve-batch \
--filter "purl=pkg:npm/lodash@4.17.20" \
--resolution fixed \
--justification "Upgraded to 4.17.21 fleet-wide" \
--evidence "Fleet upgrade ticket FLEET-456"
# Resolve false positives from analysis
stella unknowns resolve-batch \
--file false-positives.json \
--resolution false_positive
5.4 Resolution Audit Trail
# View resolution history
stella unknowns history unk-12345678-...
# Output:
# 2025-12-15 10:00:00 - Created (score: 0.62)
# 2025-12-16 09:30:00 - Triaged by analyst@example.com
# 2025-12-17 14:00:00 - Escalated (KEV added)
# 2025-12-18 11:00:00 - Resolved by security@example.com
# Resolution: not_affected
# Justification: vulnerable_code_not_present
# Notes: Manual code review confirmed function not used
# Export audit trail
stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json
6. Troubleshooting
6.1 Score Seems Wrong
Symptom: Unknown scored too high or too low.
Diagnosis:
# View score breakdown
stella unknowns show unk-... --score-details
# View proof tree
stella unknowns proof unk-... --verbose
Common causes:
- Stale EPSS data: EPSS feed not updated
- Incorrect blast radius: Dependency data outdated
- Missing containment data: Seccomp/filesystem status unknown
Resolution:
# Trigger score recalculation
stella unknowns recalculate unk-...
# Force refresh of all input signals
stella unknowns refresh unk-... --force
6.2 Duplicate Unknowns
Symptom: Same issue appears multiple times.
Diagnosis:
# Find potential duplicates
stella unknowns duplicates --scan
# Output shows items with same CVE+PURL but different artifacts
Resolution:
# Merge duplicates
stella unknowns merge \
--primary unk-111... \
--secondary unk-222... \
--reason "Same CVE across artifact versions"
6.3 Escalation Not Working
Symptom: Escalation doesn't trigger rescan.
Diagnosis:
# Check escalation status
stella unknowns escalation-status unk-...
# Check Scheduler connectivity
stella health check --service scheduler
# Check job queue
stella scheduler queue status rescan
Resolution:
# Retry escalation
stella unknowns escalate unk-... --force
# Manual rescan trigger
stella scan trigger --artifact sha256:abc123... --priority high
6.4 Resolution Rejected
Symptom: Resolution attempt fails validation.
Diagnosis:
# Check resolution requirements
stella unknowns resolution-requirements unk-...
# Output:
# Resolution requirements for unk-12345678-...
# - Justification: required
# - Evidence: required (reason: KEV item)
# - Approver: required (band: HOT)
Resolution:
# Provide required evidence
stella unknowns resolve unk-... \
--resolution not_affected \
--justification "vulnerable_code_not_present" \
--evidence "Code review: CRV-123" \
--approver security-lead@example.com
7. Monitoring & Alerting
Updated: Sprint SPRINT_20260118_018_Unknowns_queue_enhancement (UQ-007)
7.1 Key Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
unknowns_queue_depth_hot |
HOT band queue depth | > 5 critical, > 0 for 1h warning |
unknowns_queue_depth_warm |
WARM band queue depth | > 25 warning |
unknowns_queue_depth_cold |
COLD band queue depth | > 100 warning |
unknowns_sla_compliance |
SLA compliance rate (0-1) | < 0.80 critical, < 0.95 warning |
unknowns_sla_breach_total |
Total SLA breaches (counter) | increase > 0 |
unknowns_escalated_total |
Escalations (counter) | rate > 10/hour |
unknowns_demoted_total |
Demotions (counter) | - |
unknowns_expired_total |
Expirations (counter) | - |
unknowns_processing_time_seconds |
Processing time histogram | p95 > 30s |
unknowns_resolution_time_hours |
Resolution time by band | p95 > SLA |
unknowns_state_transitions_total |
State transitions (by from/to) | - |
greyqueue_stuck_total |
Stuck processing entries | > 0 |
greyqueue_timeout_total |
Processing timeouts | > 5/hour |
greyqueue_processing_count |
Currently processing | > 10 for 30m |
7.2 Grafana Dashboard
Import dashboard from: devops/observability/grafana/dashboards/unknowns-queue-dashboard.json
Dashboard Panels:
| Panel | Description |
|---|---|
| Total Queue Depth | Stat showing total across all bands |
| HOT/WARM/COLD Unknowns | Individual band stats with thresholds |
| SLA Compliance | Gauge showing compliance percentage |
| Queue Depth Over Time | Time series by band |
| SLA Compliance Over Time | Trending compliance |
| State Transitions | Rate of state changes |
| Processing Time (p95) | Performance histogram |
| Escalations & Failures | Lifecycle events |
| Resolution Time by Band | Time-to-resolution |
| Stuck & Timeout Events | Watchdog metrics |
| SLA Breaches Today | 24h breach counter |
7.3 Alerting Rules
Alert rules deployed from: devops/observability/prometheus/rules/unknowns-queue-alerts.yaml
Critical Alerts:
| Alert | Condition | Response |
|---|---|---|
UnknownsSlaBreachCritical |
compliance < 80% | Immediate escalation to security team |
UnknownsHotQueueHigh |
HOT > 5 for 10m | Prioritize resolution |
UnknownsProcessingFailures |
Failed entries in 1h | Manual intervention required |
UnknownsSlaMonitorDown |
No metrics for 5m | Check service health |
UnknownsHealthCheckUnhealthy |
Health check failing | Check SLA breaches |
Warning Alerts:
| Alert | Condition | Response |
|---|---|---|
UnknownsSlaBreachWarning |
80% ≤ compliance < 95% | Review queue health |
UnknownsHotQueuePresent |
HOT > 0 for 1h | Check progress |
UnknownsQueueBacklog |
Total > 100 for 30m | Scale processing |
UnknownsStuckProcessing |
Processing > 10 for 30m | Check bottlenecks |
UnknownsProcessingTimeout |
Timeouts > 5/hour | Review automation |
UnknownsEscalationRate |
Escalations > 10/hour | Review criteria |
7.4 Metric-Based Troubleshooting
SLA Breach Investigation
# 1. Check current breach status
curl -s "http://prometheus:9090/api/v1/query?query=unknowns_sla_compliance" | jq
# 2. Identify breached entries
curl -s "$UNKNOWNS_API/grey-queue?status=pending" | \
jq '.items[] | select(.sla_breached == true)'
# 3. Check SLA health endpoint
curl -s "$UNKNOWNS_API/health/sla" | jq
# 4. Review breach timeline
# In Grafana: SLA Compliance Over Time panel, last 24h
Stuck Processing Investigation
# 1. Check processing count
curl -s "http://prometheus:9090/api/v1/query?query=greyqueue_processing_count" | jq
# 2. List stuck entries
curl -s "$UNKNOWNS_API/grey-queue?status=Processing" | \
jq '.items[] | select((.last_processed_at | fromdateiso8601) < (now - 3600))'
# 3. Check watchdog metrics
curl -s "http://prometheus:9090/api/v1/query?query=rate(greyqueue_stuck_total[1h])" | jq
# 4. Force retry if needed
curl -X POST "$UNKNOWNS_API/grey-queue/{id}/retry"
High Escalation Rate
# 1. Check escalation rate
curl -s "http://prometheus:9090/api/v1/query?query=rate(unknowns_escalated_total[1h])" | jq
# 2. Review escalation reasons
curl -s "$UNKNOWNS_API/grey-queue?status=Escalated" | \
jq 'group_by(.escalation_reason) | map({reason: .[0].escalation_reason, count: length})'
# 3. Check for EPSS/KEV spikes
# Events triggering escalations:
# - epss.updated with score increase
# - kev.added events
# - deployment.created with affected components
Queue Growth Analysis
# 1. Check inflow rate
curl -s "http://prometheus:9090/api/v1/query?query=rate(unknowns_enqueued_total[1h])" | jq
# 2. Check resolution rate
curl -s "http://prometheus:9090/api/v1/query?query=rate(unknowns_resolved_total[1h])" | jq
# 3. Calculate net growth
# growth_rate = inflow_rate - resolution_rate
# 4. Review reasons for new unknowns
curl -s "$UNKNOWNS_API/grey-queue/summary" | jq '.by_reason'
7.5 Daily Report
# Generate daily report
stella unknowns report --format email --send-to security-team@example.com
# Report includes:
# - Queue summary (total, by band, by reason)
# - SLA status (in compliance, breaches)
# - Top 10 highest-scored items
# - Newly added items (last 24h)
# - Resolved items (last 24h)
# - KEV item status
# - Trends (7-day, 30-day)
8. Unknown Budgets
Unknown budgets enforce per-environment caps on unknowns by reason code. Budgets can warn or block when exceeded.
Configuration:
# etc/policy.unknowns.budgets.yaml
unknownBudgets:
enforceBudgets: true
budgets:
prod:
environment: prod
totalLimit: 3
reasonLimits:
Reachability: 0
Provenance: 0
VexConflict: 1
action: Block
exceededMessage: "Production requires zero reachability unknowns"
stage:
environment: stage
totalLimit: 10
reasonLimits:
Reachability: 1
action: WarnUnlessException
dev:
environment: dev
totalLimit: null
action: Warn
default:
environment: default
totalLimit: 5
action: Warn
Exception coverage:
To allow approved exceptions to cover specific unknown reason codes, set exception metadata
unknown_reason_codes (comma-separated). Example: Reachability, U-VEX.
Related Documentation
- Unknowns API Reference
- Triage Technical Reference
- Score Proofs Runbook
- Policy Engine
- Determinization API
- VEX Consensus Guide
8. Grey Queue Operations
Sprint: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli
The Grey Queue handles observations with uncertain status requiring operator attention or additional evidence. These are distinct from standard HOT/WARM/COLD band unknowns.
8.1 Grey Queue Overview
Grey Queue items have:
- Observation state:
PendingDeterminization,Disputed, orGuardedPass - Reanalysis fingerprint: Deterministic ID for reproducible replays
- Triggers: Events that caused reanalysis
- Conflicts: Detected evidence disagreements
- Next actions: Suggested resolution paths
8.2 List Grey Queue Items
# List all grey queue items
stella unknowns list --state grey
# List by observation state
stella unknowns list --observation-state pending-determinization
stella unknowns list --observation-state disputed
stella unknowns list --observation-state guarded-pass
# List with fingerprint details
stella unknowns list --state grey --show-fingerprint
# List with conflict summary
stella unknowns list --state grey --show-conflicts
8.3 View Grey Queue Details
# Show grey queue item with full details
stella unknowns show unk-12345678-... --grey
# Output:
# ID: unk-12345678-...
# Observation State: Disputed
#
# Reanalysis Fingerprint:
# ID: sha256:abc123...
# Computed At: 2026-01-15T10:00:00Z
# Policy Config Hash: sha256:def456...
#
# Triggers (2):
# - epss.updated@1 (2026-01-15T09:55:00Z) delta=0.15
# - vex.updated@1 (2026-01-15T09:50:00Z)
#
# Conflicts (1):
# - VexStatusConflict: vendor-a reports 'not_affected', vendor-b reports 'affected'
# Severity: high
# Adjudication: manual_review
#
# Next Actions:
# - trust_resolution: Resolve issuer trust conflict
# - manual_review: Escalate to security team
# Show fingerprint only
stella unknowns fingerprint unk-12345678-...
# Show triggers only
stella unknowns triggers unk-12345678-...
8.4 Grey Queue Triage Actions
# Resolve a grey queue item (operator determination)
stella unknowns resolve unk-12345678-... \
--status not_affected \
--justification "Verified vendor VEX is authoritative" \
--evidence-ref "vex-observation-id-123"
# Escalate for manual review
stella unknowns escalate unk-12345678-... \
--priority P1 \
--reason "Conflicting VEX requires security team decision"
# Defer pending additional evidence
stella unknowns defer unk-12345678-... \
--await vex \
--reason "Waiting for upstream vendor VEX statement"
8.5 Grey Queue Conflict Resolution
# List items with conflicts
stella unknowns list --has-conflicts
# Filter by conflict type
stella unknowns list --conflict-type vex-status-conflict
stella unknowns list --conflict-type vex-reachability-contradiction
stella unknowns list --conflict-type trust-tie
# Resolve a conflict manually
stella unknowns resolve-conflict unk-12345678-... \
--winner vendor-a \
--reason "vendor-a is the upstream maintainer"
8.6 Grey Queue Summary
# Get grey queue summary
stella unknowns summary --grey
# Output:
# Grey Queue: 23 items
#
# By State:
# PendingDeterminization: 15 (65%)
# Disputed: 5 (22%)
# GuardedPass: 3 (13%)
#
# Conflicts: 8 items have conflicts
# Avg. Triggers: 2.3 per item
# Oldest: 7 days
8.7 Grey Queue Export
# Export grey queue for analysis
stella unknowns export --state grey --format json --output grey-queue.json
# Export with full fingerprints and triggers
stella unknowns export --state grey --verbose --output grey-full.json
# Export conflicts only
stella unknowns export --has-conflicts --format csv --output conflicts.csv
Last Updated: 2026-01-16
Version: 1.1.0
Sprint: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli