# Unknowns Queue Management Runbook > **Version**: 1.0.0 > **Sprint**: 3500.0004.0004 > **Last Updated**: 2025-12-20 This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance. --- ## Table of Contents 1. [Overview](#1-overview) 2. [Queue Operations](#2-queue-operations) 3. [Triage Procedures](#3-triage-procedures) 4. [Escalation Workflows](#4-escalation-workflows) 5. [Resolution Procedures](#5-resolution-procedures) 6. [Troubleshooting](#6-troubleshooting) 7. [Monitoring & Alerting](#7-monitoring--alerting) --- ## 1. Overview ### What are Unknowns? Unknowns are items that could not be fully classified during scanning due to: - Missing VEX statements - Ambiguous indirect calls in call graphs - Incomplete SBOM data - Missing advisory information - Conflicting evidence from multiple sources ### Unknown Ranking Unknowns are ranked using a 2-factor scoring model: ``` score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction ``` | Factor | Weight | Description | |--------|--------|-------------| | Blast Radius | 0.60 | Impact scope (dependents, network exposure) | | Evidence Scarcity | 0.30 | How much data is missing | | Exploit Pressure | 0.30 | EPSS score, KEV status | | Containment | -0.20 | Mitigation factors (seccomp, read-only FS) | ### Band Assignment | Band | Score Range | Priority | SLA | |------|-------------|----------|-----| | HOT | ≥ 0.70 | Critical | 24 hours | | WARM | 0.40 - 0.69 | Normal | 7 days | | COLD | < 0.40 | Low | 30 days | --- ## 2. Queue Operations ### 2.1 View Queue Status ```bash # Get queue summary stella unknowns summary # Output: # Total: 142 unknowns # HOT: 12 (8%) - Requires immediate attention # WARM: 85 (60%) - Normal priority # COLD: 45 (32%) - Low priority # # KEV items: 3 # Average score: 0.52 # Get queue summary via API curl "https://scanner.example.com/api/v1/unknowns/summary" \ -H "Authorization: Bearer $TOKEN" ``` ### 2.2 List Unknowns ```bash # List all HOT unknowns stella unknowns list --band HOT # List by score (highest first) stella unknowns list --sort score --order desc --limit 20 # Filter by reason stella unknowns list --reason missing_vex # Filter by artifact stella unknowns list --artifact sha256:abc123... # Filter by KEV status stella unknowns list --kev true ``` ### 2.3 View Unknown Details ```bash # Get detailed view stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456 # Output: # ID: unk-12345678-... # Artifact: pkg:oci/myapp@sha256:abc123 # Reasons: [missing_vex, ambiguous_indirect_call] # # Blast Radius: # Dependents: 15 services # Network: internet-facing # Privilege: user # # Evidence Scarcity: 0.7 (high) # # Exploit Pressure: # EPSS: 0.45 # KEV: false # # Containment: # Seccomp: enforced (-0.10) # Filesystem: read-only (-0.10) # # Score: 0.62 (WARM band) # Score Breakdown: # Blast component: +0.35 # Scarcity component: +0.21 # Pressure component: +0.26 # Containment deduction: -0.20 # Show proof tree stella unknowns proof unk-12345678-... ``` ### 2.4 Export Queue Data ```bash # Export for analysis stella unknowns export --format json --output unknowns.json # Export HOT items for daily review stella unknowns export --band HOT --format csv --output hot-unknowns.csv # Export with full details stella unknowns export --verbose --include-proofs --output full-export.json ``` --- ## 3. Triage Procedures ### 3.1 Daily Triage Workflow **Schedule**: Daily at 9:00 AM **Duration**: 30 minutes **Participants**: Security analyst, on-call engineer **Process**: ```bash # 1. Get today's queue snapshot stella unknowns snapshot --output daily-$(date +%Y%m%d).json # 2. Review all HOT items stella unknowns list --band HOT --since 24h # 3. For each HOT unknown, determine action: # - Escalate: Trigger immediate rescan # - Investigate: Needs manual analysis # - Defer: Move to WARM (with justification) # - Resolve: Evidence found, can close # 4. Process each item stella unknowns triage unk-12345678-... --action escalate stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor" stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected" ``` ### 3.2 Triage Decision Matrix | Reason Code | KEV | EPSS > 0.5 | Action | |-------------|-----|------------|--------| | `missing_vex` | Yes | Any | Escalate + Vendor outreach | | `missing_vex` | No | Yes | Escalate | | `missing_vex` | No | No | Request VEX | | `ambiguous_indirect_call` | Any | Any | Manual code review | | `incomplete_sbom` | Any | Any | Rescan with updated extractor | | `conflicting_evidence` | Any | Any | Manual analysis | ### 3.3 Triage Templates ```bash # Quick escalate (HOT + KEV) stella unknowns triage unk-... --action escalate \ --priority P1 \ --notes "KEV item, requires immediate attention" # Request vendor VEX stella unknowns triage unk-... --action investigate \ --notes "Requested VEX from vendor via security@vendor.com" \ --due-date 7d # Mark for code review stella unknowns triage unk-... --action investigate \ --notes "Requires manual code review to resolve indirect call" \ --assign @code-review-team # Defer with justification stella unknowns triage unk-... --action defer \ --reason "Component not deployed to production" \ --evidence "deployment-manifest.yaml shows staging-only" ``` --- ## 4. Escalation Workflows ### 4.1 Automatic Escalation Unknowns are automatically escalated when: - Score increases above HOT threshold (0.70) - KEV status added to related CVE - EPSS score increases significantly (> 0.2 delta) - Blast radius increases (new dependents detected) **Configure auto-escalation**: ```yaml # policy.unknowns.escalation.yaml autoEscalation: enabled: true triggers: - condition: score >= 0.70 action: escalate notify: [security-team] - condition: kev == true action: escalate priority: P1 notify: [security-team, management] - condition: epss_delta > 0.2 action: escalate notify: [security-team] ``` ### 4.2 Manual Escalation ```bash # Escalate via CLI stella unknowns escalate unk-12345678-... # Escalate with reason stella unknowns escalate unk-12345678-... \ --reason "Customer reported potential exploit" # Escalate to trigger rescan stella unknowns escalate unk-12345678-... --rescan # Output: # Escalated: unk-12345678-... # Rescan job: rescan-job-001 # Status: queued # ETA: 5 minutes ``` ### 4.3 Bulk Escalation ```bash # Escalate all KEV items stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation" # Escalate high-score items stella unknowns escalate --filter "score>=0.8" --rescan # Escalate by artifact stella unknowns escalate --artifact sha256:abc123... --reason "Production incident" ``` ### 4.4 Escalation SLA Tracking ```bash # Check SLA status stella unknowns sla-status # Output: # HOT unknowns SLA (24h): # In SLA: 10 (83%) # Breached: 2 (17%) # # Breached items: # unk-111... (26h old) - missing_vex # unk-222... (30h old) - conflicting_evidence # Get SLA breach notifications stella unknowns list --sla-breached ``` --- ## 5. Resolution Procedures ### 5.1 Resolution Types | Resolution | Description | Evidence Required | |------------|-------------|-------------------| | `not_affected` | Vulnerability doesn't apply | VEX statement or manual analysis | | `fixed` | Vulnerability patched | Version upgrade confirmation | | `mitigated` | Controls in place | Mitigation documentation | | `false_positive` | Incorrect classification | Analysis report | | `wont_fix` | Accepted risk | Risk acceptance form | ### 5.2 Resolve Unknown ```bash # Resolve as not affected stella unknowns resolve unk-12345678-... \ --resolution not_affected \ --justification "vulnerable_code_not_present" \ --notes "Manual code review confirmed function not used" # Resolve as fixed stella unknowns resolve unk-12345678-... \ --resolution fixed \ --justification "version_upgraded" \ --evidence "Upgraded lodash to 4.17.21, CVE patched" # Resolve as mitigated stella unknowns resolve unk-12345678-... \ --resolution mitigated \ --justification "inline_mitigations_exist" \ --evidence "WAF rule WAF-001 blocks exploit pattern" # Resolve as won't fix (risk accepted) stella unknowns resolve unk-12345678-... \ --resolution wont_fix \ --justification "risk_accepted" \ --evidence "Risk acceptance ticket RISK-123" \ --expires 90d # Re-evaluate in 90 days ``` ### 5.3 Bulk Resolution ```bash # Resolve all items for a fixed package version stella unknowns resolve-batch \ --filter "purl=pkg:npm/lodash@4.17.20" \ --resolution fixed \ --justification "Upgraded to 4.17.21 fleet-wide" \ --evidence "Fleet upgrade ticket FLEET-456" # Resolve false positives from analysis stella unknowns resolve-batch \ --file false-positives.json \ --resolution false_positive ``` ### 5.4 Resolution Audit Trail ```bash # View resolution history stella unknowns history unk-12345678-... # Output: # 2025-12-15 10:00:00 - Created (score: 0.62) # 2025-12-16 09:30:00 - Triaged by analyst@example.com # 2025-12-17 14:00:00 - Escalated (KEV added) # 2025-12-18 11:00:00 - Resolved by security@example.com # Resolution: not_affected # Justification: vulnerable_code_not_present # Notes: Manual code review confirmed function not used # Export audit trail stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json ``` --- ## 6. Troubleshooting ### 6.1 Score Seems Wrong **Symptom**: Unknown scored too high or too low. **Diagnosis**: ```bash # View score breakdown stella unknowns show unk-... --score-details # View proof tree stella unknowns proof unk-... --verbose ``` **Common causes**: 1. **Stale EPSS data**: EPSS feed not updated 2. **Incorrect blast radius**: Dependency data outdated 3. **Missing containment data**: Seccomp/filesystem status unknown **Resolution**: ```bash # Trigger score recalculation stella unknowns recalculate unk-... # Force refresh of all input signals stella unknowns refresh unk-... --force ``` ### 6.2 Duplicate Unknowns **Symptom**: Same issue appears multiple times. **Diagnosis**: ```bash # Find potential duplicates stella unknowns duplicates --scan # Output shows items with same CVE+PURL but different artifacts ``` **Resolution**: ```bash # Merge duplicates stella unknowns merge \ --primary unk-111... \ --secondary unk-222... \ --reason "Same CVE across artifact versions" ``` ### 6.3 Escalation Not Working **Symptom**: Escalation doesn't trigger rescan. **Diagnosis**: ```bash # Check escalation status stella unknowns escalation-status unk-... # Check Scheduler connectivity stella health check --service scheduler # Check job queue stella scheduler queue status rescan ``` **Resolution**: ```bash # Retry escalation stella unknowns escalate unk-... --force # Manual rescan trigger stella scan trigger --artifact sha256:abc123... --priority high ``` ### 6.4 Resolution Rejected **Symptom**: Resolution attempt fails validation. **Diagnosis**: ```bash # Check resolution requirements stella unknowns resolution-requirements unk-... # Output: # Resolution requirements for unk-12345678-... # - Justification: required # - Evidence: required (reason: KEV item) # - Approver: required (band: HOT) ``` **Resolution**: ```bash # Provide required evidence stella unknowns resolve unk-... \ --resolution not_affected \ --justification "vulnerable_code_not_present" \ --evidence "Code review: CRV-123" \ --approver security-lead@example.com ``` --- ## 7. Monitoring & Alerting > **Updated**: Sprint SPRINT_20260118_018_Unknowns_queue_enhancement (UQ-007) ### 7.1 Key Metrics | Metric | Description | Alert Threshold | |--------|-------------|-----------------| | `unknowns_queue_depth_hot` | HOT band queue depth | > 5 critical, > 0 for 1h warning | | `unknowns_queue_depth_warm` | WARM band queue depth | > 25 warning | | `unknowns_queue_depth_cold` | COLD band queue depth | > 100 warning | | `unknowns_sla_compliance` | SLA compliance rate (0-1) | < 0.80 critical, < 0.95 warning | | `unknowns_sla_breach_total` | Total SLA breaches (counter) | increase > 0 | | `unknowns_escalated_total` | Escalations (counter) | rate > 10/hour | | `unknowns_demoted_total` | Demotions (counter) | - | | `unknowns_expired_total` | Expirations (counter) | - | | `unknowns_processing_time_seconds` | Processing time histogram | p95 > 30s | | `unknowns_resolution_time_hours` | Resolution time by band | p95 > SLA | | `unknowns_state_transitions_total` | State transitions (by from/to) | - | | `greyqueue_stuck_total` | Stuck processing entries | > 0 | | `greyqueue_timeout_total` | Processing timeouts | > 5/hour | | `greyqueue_processing_count` | Currently processing | > 10 for 30m | ### 7.2 Grafana Dashboard Import dashboard from: `devops/observability/grafana/dashboards/unknowns-queue-dashboard.json` **Dashboard Panels:** | Panel | Description | |-------|-------------| | Total Queue Depth | Stat showing total across all bands | | HOT/WARM/COLD Unknowns | Individual band stats with thresholds | | SLA Compliance | Gauge showing compliance percentage | | Queue Depth Over Time | Time series by band | | SLA Compliance Over Time | Trending compliance | | State Transitions | Rate of state changes | | Processing Time (p95) | Performance histogram | | Escalations & Failures | Lifecycle events | | Resolution Time by Band | Time-to-resolution | | Stuck & Timeout Events | Watchdog metrics | | SLA Breaches Today | 24h breach counter | ### 7.3 Alerting Rules Alert rules deployed from: `devops/observability/prometheus/rules/unknowns-queue-alerts.yaml` **Critical Alerts:** | Alert | Condition | Response | |-------|-----------|----------| | `UnknownsSlaBreachCritical` | compliance < 80% | Immediate escalation to security team | | `UnknownsHotQueueHigh` | HOT > 5 for 10m | Prioritize resolution | | `UnknownsProcessingFailures` | Failed entries in 1h | Manual intervention required | | `UnknownsSlaMonitorDown` | No metrics for 5m | Check service health | | `UnknownsHealthCheckUnhealthy` | Health check failing | Check SLA breaches | **Warning Alerts:** | Alert | Condition | Response | |-------|-----------|----------| | `UnknownsSlaBreachWarning` | 80% ≤ compliance < 95% | Review queue health | | `UnknownsHotQueuePresent` | HOT > 0 for 1h | Check progress | | `UnknownsQueueBacklog` | Total > 100 for 30m | Scale processing | | `UnknownsStuckProcessing` | Processing > 10 for 30m | Check bottlenecks | | `UnknownsProcessingTimeout` | Timeouts > 5/hour | Review automation | | `UnknownsEscalationRate` | Escalations > 10/hour | Review criteria | ### 7.4 Metric-Based Troubleshooting #### SLA Breach Investigation ```bash # 1. Check current breach status curl -s "http://prometheus:9090/api/v1/query?query=unknowns_sla_compliance" | jq # 2. Identify breached entries curl -s "$UNKNOWNS_API/grey-queue?status=pending" | \ jq '.items[] | select(.sla_breached == true)' # 3. Check SLA health endpoint curl -s "$UNKNOWNS_API/health/sla" | jq # 4. Review breach timeline # In Grafana: SLA Compliance Over Time panel, last 24h ``` #### Stuck Processing Investigation ```bash # 1. Check processing count curl -s "http://prometheus:9090/api/v1/query?query=greyqueue_processing_count" | jq # 2. List stuck entries curl -s "$UNKNOWNS_API/grey-queue?status=Processing" | \ jq '.items[] | select((.last_processed_at | fromdateiso8601) < (now - 3600))' # 3. Check watchdog metrics curl -s "http://prometheus:9090/api/v1/query?query=rate(greyqueue_stuck_total[1h])" | jq # 4. Force retry if needed curl -X POST "$UNKNOWNS_API/grey-queue/{id}/retry" ``` #### High Escalation Rate ```bash # 1. Check escalation rate curl -s "http://prometheus:9090/api/v1/query?query=rate(unknowns_escalated_total[1h])" | jq # 2. Review escalation reasons curl -s "$UNKNOWNS_API/grey-queue?status=Escalated" | \ jq 'group_by(.escalation_reason) | map({reason: .[0].escalation_reason, count: length})' # 3. Check for EPSS/KEV spikes # Events triggering escalations: # - epss.updated with score increase # - kev.added events # - deployment.created with affected components ``` #### Queue Growth Analysis ```bash # 1. Check inflow rate curl -s "http://prometheus:9090/api/v1/query?query=rate(unknowns_enqueued_total[1h])" | jq # 2. Check resolution rate curl -s "http://prometheus:9090/api/v1/query?query=rate(unknowns_resolved_total[1h])" | jq # 3. Calculate net growth # growth_rate = inflow_rate - resolution_rate # 4. Review reasons for new unknowns curl -s "$UNKNOWNS_API/grey-queue/summary" | jq '.by_reason' ``` ### 7.5 Daily Report ```bash # Generate daily report stella unknowns report --format email --send-to security-team@example.com # Report includes: # - Queue summary (total, by band, by reason) # - SLA status (in compliance, breaches) # - Top 10 highest-scored items # - Newly added items (last 24h) # - Resolved items (last 24h) # - KEV item status # - Trends (7-day, 30-day) ``` --- ## 8. Unknown Budgets Unknown budgets enforce per-environment caps on unknowns by reason code. Budgets can warn or block when exceeded. **Configuration**: ```yaml # etc/policy.unknowns.budgets.yaml unknownBudgets: enforceBudgets: true budgets: prod: environment: prod totalLimit: 3 reasonLimits: Reachability: 0 Provenance: 0 VexConflict: 1 action: Block exceededMessage: "Production requires zero reachability unknowns" stage: environment: stage totalLimit: 10 reasonLimits: Reachability: 1 action: WarnUnlessException dev: environment: dev totalLimit: null action: Warn default: environment: default totalLimit: 5 action: Warn ``` **Exception coverage**: To allow approved exceptions to cover specific unknown reason codes, set exception metadata `unknown_reason_codes` (comma-separated). Example: `Reachability, U-VEX`. --- ## Related Documentation - [Unknowns API Reference](../api/score-proofs-reachability-api-reference.md#5-unknowns-api) - [Triage Technical Reference](../product/advisories/14-Dec-2025%20-%20Triage%20and%20Unknowns%20Technical%20Reference.md) - [Score Proofs Runbook](./score-proofs-runbook.md) - [Policy Engine](../modules/policy/architecture.md) - [Determinization API](../modules/policy/determinization-api.md) - [VEX Consensus Guide](../VEX_CONSENSUS_GUIDE.md) --- ## 8. Grey Queue Operations > **Sprint**: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli The Grey Queue handles observations with uncertain status requiring operator attention or additional evidence. These are distinct from standard HOT/WARM/COLD band unknowns. ### 8.1 Grey Queue Overview Grey Queue items have: - **Observation state**: `PendingDeterminization`, `Disputed`, or `GuardedPass` - **Reanalysis fingerprint**: Deterministic ID for reproducible replays - **Triggers**: Events that caused reanalysis - **Conflicts**: Detected evidence disagreements - **Next actions**: Suggested resolution paths ### 8.2 List Grey Queue Items ```bash # List all grey queue items stella unknowns list --state grey # List by observation state stella unknowns list --observation-state pending-determinization stella unknowns list --observation-state disputed stella unknowns list --observation-state guarded-pass # List with fingerprint details stella unknowns list --state grey --show-fingerprint # List with conflict summary stella unknowns list --state grey --show-conflicts ``` ### 8.3 View Grey Queue Details ```bash # Show grey queue item with full details stella unknowns show unk-12345678-... --grey # Output: # ID: unk-12345678-... # Observation State: Disputed # # Reanalysis Fingerprint: # ID: sha256:abc123... # Computed At: 2026-01-15T10:00:00Z # Policy Config Hash: sha256:def456... # # Triggers (2): # - epss.updated@1 (2026-01-15T09:55:00Z) delta=0.15 # - vex.updated@1 (2026-01-15T09:50:00Z) # # Conflicts (1): # - VexStatusConflict: vendor-a reports 'not_affected', vendor-b reports 'affected' # Severity: high # Adjudication: manual_review # # Next Actions: # - trust_resolution: Resolve issuer trust conflict # - manual_review: Escalate to security team # Show fingerprint only stella unknowns fingerprint unk-12345678-... # Show triggers only stella unknowns triggers unk-12345678-... ``` ### 8.4 Grey Queue Triage Actions ```bash # Resolve a grey queue item (operator determination) stella unknowns resolve unk-12345678-... \ --status not_affected \ --justification "Verified vendor VEX is authoritative" \ --evidence-ref "vex-observation-id-123" # Escalate for manual review stella unknowns escalate unk-12345678-... \ --priority P1 \ --reason "Conflicting VEX requires security team decision" # Defer pending additional evidence stella unknowns defer unk-12345678-... \ --await vex \ --reason "Waiting for upstream vendor VEX statement" ``` ### 8.5 Grey Queue Conflict Resolution ```bash # List items with conflicts stella unknowns list --has-conflicts # Filter by conflict type stella unknowns list --conflict-type vex-status-conflict stella unknowns list --conflict-type vex-reachability-contradiction stella unknowns list --conflict-type trust-tie # Resolve a conflict manually stella unknowns resolve-conflict unk-12345678-... \ --winner vendor-a \ --reason "vendor-a is the upstream maintainer" ``` ### 8.6 Grey Queue Summary ```bash # Get grey queue summary stella unknowns summary --grey # Output: # Grey Queue: 23 items # # By State: # PendingDeterminization: 15 (65%) # Disputed: 5 (22%) # GuardedPass: 3 (13%) # # Conflicts: 8 items have conflicts # Avg. Triggers: 2.3 per item # Oldest: 7 days ``` ### 8.7 Grey Queue Export ```bash # Export grey queue for analysis stella unknowns export --state grey --format json --output grey-queue.json # Export with full fingerprints and triggers stella unknowns export --state grey --verbose --output grey-full.json # Export conflicts only stella unknowns export --has-conflicts --format csv --output conflicts.csv ``` --- **Last Updated**: 2026-01-16 **Version**: 1.1.0 **Sprint**: SPRINT_20260112_010_CLI_unknowns_grey_queue_cli