feat: Add operations runbooks and UI API models for Sprint 3500.0004.x

Operations documentation: - docs/operations/reachability-runbook.md - Reachability troubleshooting guide - docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide UI TypeScript models: - src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types - src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types - src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
2025-12-20 22:22:09 +02:00
parent efe9bd8cfe
commit da315965ff
5 changed files with 1719 additions and 0 deletions
--- a/docs/operations/reachability-runbook.md
+++ b/docs/operations/reachability-runbook.md
@@ -0,0 +1,594 @@
+# Reachability Analysis Operations Runbook
+
+> **Version**: 1.0.0  
+> **Sprint**: 3500.0004.0004  
+> **Last Updated**: 2025-12-20
+
+This runbook covers operational procedures for Reachability Analysis, including call graph management, analysis troubleshooting, and explain queries.
+
+---
+
+## Table of Contents
+
+1. [Overview](#1-overview)
+2. [Call Graph Operations](#2-call-graph-operations)
+3. [Reachability Computation](#3-reachability-computation)
+4. [Explain Queries](#4-explain-queries)
+5. [Troubleshooting](#5-troubleshooting)
+6. [Monitoring & Alerting](#6-monitoring--alerting)
+7. [Escalation Procedures](#7-escalation-procedures)
+
+---
+
+## 1. Overview
+
+### What is Reachability Analysis?
+
+Reachability Analysis determines whether vulnerable code is actually reachable from application entrypoints. This reduces false positives by filtering out vulnerabilities in code that cannot be executed.
+
+### Reachability Statuses
+
+| Status | Confidence | Description |
+|--------|------------|-------------|
+| `UNREACHABLE` | High | No path from entrypoints to vulnerable code |
+| `POSSIBLY_REACHABLE` | Medium | Path exists but contains heuristic edges |
+| `REACHABLE_STATIC` | High | Static analysis proves path exists |
+| `REACHABLE_PROVEN` | Very High | Runtime evidence confirms execution |
+| `UNKNOWN` | Low | Insufficient data to determine |
+
+### Key Components
+
+| Component | Purpose | Location |
+|-----------|---------|----------|
+| Call Graph Extractor | Language-specific CG extraction | Scanner Worker plugins |
+| Call Graph Store | Persistent graph storage | `scanner.cg_node`, `scanner.cg_edge` |
+| Reachability Analyzer | BFS pathfinding algorithm | Scanner Core library |
+| Entrypoint Detector | Identifies application entrypoints | Language-specific plugins |
+
+### Prerequisites
+
+- Access to Scanner WebService API
+- `scanner.reachability` OAuth scope
+- CLI access with `stella` configured
+- Language-specific workers deployed (dotnet, java, etc.)
+
+---
+
+## 2. Call Graph Operations
+
+### 2.1 Call Graph Upload
+
+```bash
+# Upload via API
+curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/callgraphs" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -H "Content-Digest: sha256=$(sha256sum callgraph.json | cut -d' ' -f1)" \
+  -d @callgraph.json
+
+# Upload via CLI
+stella scan graph upload --scan-id $SCAN_ID --file callgraph.json
+
+# Upload streaming NDJSON (for large graphs)
+stella scan graph upload --scan-id $SCAN_ID \
+  --file callgraph.ndjson \
+  --format ndjson \
+  --streaming
+```
+
+### 2.2 Call Graph Inspection
+
+```bash
+# Get call graph summary
+stella scan graph summary --scan-id $SCAN_ID
+
+# Output:
+# Nodes: 12,345
+# Edges: 56,789
+# Entrypoints: 42
+# Languages: [dotnet, java]
+# Size: 15.2 MB
+
+# List entrypoints
+stella scan graph entrypoints --scan-id $SCAN_ID
+
+# Export full graph (for debugging)
+stella scan graph export --scan-id $SCAN_ID --output graph.json
+
+# Visualize subgraph (requires GraphViz)
+stella scan graph visualize --scan-id $SCAN_ID \
+  --node sha256:node123... \
+  --depth 3 \
+  --output subgraph.svg
+```
+
+### 2.3 Call Graph Validation
+
+```bash
+# Validate graph structure
+stella scan graph validate --scan-id $SCAN_ID
+
+# Checks performed:
+# - All edge targets exist as nodes
+# - Entrypoints reference valid nodes
+# - No orphan nodes
+# - No cycles in entrypoint definitions
+# - Schema compliance
+
+# Validate before upload
+stella scan graph validate --file callgraph.json --strict
+```
+
+### 2.4 Call Graph Merging
+
+When multiple language workers produce graphs:
+
+```bash
+# View merge status
+stella scan graph merges --scan-id $SCAN_ID
+
+# Output:
+# Language   | Nodes  | Edges  | Status
+# dotnet     | 8,234  | 34,567 | merged
+# java       | 4,111  | 22,222 | merged
+# Total      | 12,345 | 56,789 | complete
+
+# Force re-merge (after fix)
+stella scan graph merge --scan-id $SCAN_ID --force
+```
+
+---
+
+## 3. Reachability Computation
+
+### 3.1 Triggering Computation
+
+```bash
+# Trigger via API
+curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/compute" \
+  -H "Authorization: Bearer $TOKEN"
+
+# Trigger via CLI
+stella reachability compute --scan-id $SCAN_ID
+
+# Trigger with options
+stella reachability compute --scan-id $SCAN_ID \
+  --max-depth 20 \
+  --indirect-resolution conservative \
+  --timeout 300s
+```
+
+### 3.2 Computation Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `max-depth` | 10 | Maximum path length to explore |
+| `indirect-resolution` | `conservative` | How to handle indirect calls: `conservative`, `aggressive`, `skip` |
+| `timeout` | 300s | Maximum computation time |
+| `parallel` | true | Parallel BFS from multiple entrypoints |
+| `include-runtime` | true | Merge runtime evidence if available |
+
+### 3.3 Job Monitoring
+
+```bash
+# Check job status
+stella reachability job-status --job-id reachability-job-001
+
+# Output:
+# Status: running
+# Progress: 67% (8,234 / 12,345 nodes visited)
+# Started: 2025-12-20T10:00:00Z
+# Estimated completion: 2025-12-20T10:02:30Z
+
+# Stream job logs
+stella reachability job-logs --job-id reachability-job-001 --follow
+
+# Cancel running job
+stella reachability job-cancel --job-id reachability-job-001
+```
+
+### 3.4 Computation Results
+
+```bash
+# Get summary
+stella reachability summary --scan-id $SCAN_ID
+
+# Output:
+# Total vulnerabilities: 45
+# Unreachable: 38 (84%)
+# Possibly reachable: 4 (9%)
+# Reachable (static): 2 (4%)
+# Reachable (proven): 1 (2%)
+# Unknown: 0 (0%)
+
+# Get detailed findings
+stella reachability findings --scan-id $SCAN_ID --format json
+
+# Filter by status
+stella reachability findings --scan-id $SCAN_ID --status REACHABLE_STATIC
+
+# Export for CI gate
+stella reachability findings --scan-id $SCAN_ID \
+  --status REACHABLE_STATIC,REACHABLE_PROVEN \
+  --format sarif \
+  --output findings.sarif
+```
+
+---
+
+## 4. Explain Queries
+
+### 4.1 Explain Single Finding
+
+```bash
+# Via API
+curl "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/explain?cve=CVE-2024-1234&purl=pkg:npm/lodash@4.17.20" \
+  -H "Authorization: Bearer $TOKEN"
+
+# Via CLI
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --purl "pkg:npm/lodash@4.17.20"
+
+# Output:
+# Status: REACHABLE_STATIC
+# Confidence: 0.70
+# 
+# Shortest Path (depth=3):
+# [0] MyApp.Controllers.OrdersController::Get(Guid)
+#     Entrypoint: HTTP GET /api/orders/{id}
+# [1] MyApp.Services.OrderService::Process(Order)
+#     Edge: static (direct_call)
+# [2] Lodash.merge(Object, Object) [VULNERABLE]
+#     Edge: static (direct_call)
+# 
+# Why Reachable:
+# - Static call path exists from HTTP entrypoint
+# - All edges are statically proven
+# - Vulnerable function is directly invoked
+```
+
+### 4.2 Explain with Alternatives
+
+```bash
+# Show all paths (not just shortest)
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --purl "pkg:npm/lodash@4.17.20" \
+  --all-paths
+
+# Output includes:
+# Alternative paths found: 3
+# Path 1 (depth=3): ... [shown above]
+# Path 2 (depth=5): Controllers.UserController -> ... -> Lodash.merge
+# Path 3 (depth=7): Background.JobProcessor -> ... -> Lodash.merge
+```
+
+### 4.3 Why Unreachable
+
+```bash
+# Explain why vulnerability is unreachable
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-5678 \
+  --purl "pkg:npm/unused-lib@1.0.0"
+
+# Output:
+# Status: UNREACHABLE
+# Confidence: 0.95
+# 
+# Why Unreachable:
+# - No path found from any entrypoint
+# - Vulnerable function: UnusedLib.dangerousMethod()
+# - Function visibility: private
+# - Callers found: 0
+# - Dead code analysis: likely dead code
+```
+
+### 4.4 Batch Explain
+
+```bash
+# Export all reachability explanations
+stella reachability explain-all --scan-id $SCAN_ID \
+  --output explanations.json
+
+# Explain only reachable findings
+stella reachability explain-all --scan-id $SCAN_ID \
+  --status REACHABLE_STATIC,REACHABLE_PROVEN \
+  --output reachable-explanations.json
+```
+
+---
+
+## 5. Troubleshooting
+
+### 5.1 Call Graph Too Large
+
+**Symptom**: Upload fails with "413 Payload Too Large".
+
+**Diagnosis**:
+
+```bash
+# Check graph size
+du -h callgraph.json
+wc -l callgraph.json
+
+# Count nodes/edges
+jq '.nodes | length' callgraph.json
+jq '.edges | length' callgraph.json
+```
+
+**Resolution**:
+
+```bash
+# Option 1: Use streaming upload
+stella scan graph upload --scan-id $SCAN_ID \
+  --file callgraph.json \
+  --streaming
+
+# Option 2: Convert to NDJSON
+stella scan graph convert --input callgraph.json \
+  --output callgraph.ndjson \
+  --format ndjson
+
+# Option 3: Partition by artifact
+stella scan graph partition --input callgraph.json \
+  --output-dir ./partitions/ \
+  --by artifact
+```
+
+### 5.2 Missing Entrypoints
+
+**Symptom**: "No entrypoints found" warning.
+
+**Diagnosis**:
+
+```bash
+# Check entrypoint detection
+stella scan graph entrypoints --scan-id $SCAN_ID --verbose
+
+# Check for framework detection
+stella scan graph detect-framework --scan-id $SCAN_ID
+```
+
+**Common causes**:
+
+1. **Framework not detected**: Add framework hints
+2. **Custom entrypoints**: Manually specify
+3. **Wrong language worker**: Check artifact analysis
+
+**Resolution**:
+
+```bash
+# Specify framework explicitly
+stella scan graph upload --scan-id $SCAN_ID \
+  --file callgraph.json \
+  --framework aspnetcore
+
+# Add custom entrypoints
+stella scan graph entrypoint add --scan-id $SCAN_ID \
+  --node sha256:node123... \
+  --kind http \
+  --route "/api/custom"
+```
+
+### 5.3 Reachability Computation Timeout
+
+**Symptom**: Job fails with "computation timeout".
+
+**Diagnosis**:
+
+```bash
+# Check computation stats
+stella reachability job-stats --job-id reachability-job-001
+
+# Output:
+# Nodes visited: 500,000
+# Edges traversed: 2,500,000
+# Time elapsed: 300s
+# Memory used: 4.2 GB
+```
+
+**Resolution**:
+
+```bash
+# Option 1: Increase timeout
+stella reachability compute --scan-id $SCAN_ID --timeout 600s
+
+# Option 2: Reduce depth
+stella reachability compute --scan-id $SCAN_ID --max-depth 5
+
+# Option 3: Skip indirect calls
+stella reachability compute --scan-id $SCAN_ID --indirect-resolution skip
+
+# Option 4: Partition analysis
+stella reachability compute --scan-id $SCAN_ID --partition-by artifact
+```
+
+### 5.4 Inconsistent Results
+
+**Symptom**: Different results between runs.
+
+**Diagnosis**:
+
+```bash
+# Check determinism settings
+stella scan manifest --scan-id $SCAN_ID | jq '.deterministic, .seed'
+
+# Compare graph hashes
+stella scan graph hash --scan-id $SCAN_ID
+```
+
+**Resolution**:
+
+```bash
+# Ensure deterministic mode
+stella reachability compute --scan-id $SCAN_ID \
+  --deterministic \
+  --seed "AQIDBA=="  # Fixed seed
+
+# Use same graph version
+stella reachability compute --scan-id $SCAN_ID \
+  --graph-digest sha256:cg123...
+```
+
+### 5.5 False Positives/Negatives
+
+**Symptom**: Reachability verdict seems incorrect.
+
+**Diagnosis**:
+
+```bash
+# Get detailed explanation
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --purl "pkg:npm/lodash@4.17.20" \
+  --verbose
+
+# Check edge confidence
+stella scan graph edge --scan-id $SCAN_ID \
+  --from sha256:nodeA... \
+  --to sha256:nodeB...
+```
+
+**Common causes for false positives**:
+
+1. **Heuristic edges**: Indirect call resolution too aggressive
+2. **Reflection/dynamic calls**: May create false paths
+3. **Dead code not detected**: Code exists but never executes
+
+**Common causes for false negatives**:
+
+1. **Missing edges**: Call graph incomplete
+2. **Indirect calls skipped**: Resolution too conservative
+3. **Cross-language calls**: Language boundary not bridged
+
+**Resolution**:
+
+```bash
+# Adjust indirect call resolution
+stella reachability compute --scan-id $SCAN_ID \
+  --indirect-resolution conservative
+
+# Add runtime evidence
+stella scan evidence upload --scan-id $SCAN_ID \
+  --file runtime-trace.json
+
+# Report false positive/negative for ML training
+stella reachability feedback --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --verdict false-positive \
+  --reason "Dead code - feature flag disabled"
+```
+
+---
+
+## 6. Monitoring & Alerting
+
+### 6.1 Key Metrics
+
+| Metric | Description | Alert Threshold |
+|--------|-------------|-----------------|
+| `callgraph_upload_duration_seconds` | Time to upload call graph | > 60s |
+| `callgraph_size_bytes` | Size of uploaded graphs | > 200MB |
+| `reachability_computation_duration_seconds` | Time to compute reachability | > 300s |
+| `reachability_nodes_visited` | Nodes visited during BFS | > 1M |
+| `reachability_job_failures_total` | Failed computation jobs | > 0/hour |
+| `entrypoint_detection_rate` | % of scans with entrypoints | < 90% |
+
+### 6.2 Grafana Dashboard
+
+```
+Dashboard: Reachability Operations
+Panels:
+- Call graph upload throughput
+- Graph size distribution
+- Computation duration (p50, p95, p99)
+- Reachability verdict distribution
+- Job queue depth
+- Entrypoint detection rate
+```
+
+### 6.3 Alerting Rules
+
+```yaml
+groups:
+  - name: reachability
+    rules:
+      - alert: ReachabilityComputationSlow
+        expr: histogram_quantile(0.95, reachability_computation_duration_seconds) > 300
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Reachability computation is slow"
+          
+      - alert: ReachabilityJobFailures
+        expr: increase(reachability_job_failures_total[1h]) > 5
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Multiple reachability job failures"
+          
+      - alert: LowEntrypointDetectionRate
+        expr: entrypoint_detection_rate < 0.8
+        for: 1h
+        labels:
+          severity: warning
+        annotations:
+          summary: "Entrypoint detection rate is low"
+```
+
+---
+
+## 7. Escalation Procedures
+
+### 7.1 Escalation Matrix
+
+| Severity | Condition | Response Time | Escalation Path |
+|----------|-----------|---------------|-----------------|
+| P1 | Reachability failing for all scans | 15 min | On-call → Team Lead |
+| P2 | Computation failures > 20% | 1 hour | On-call → Team Lead |
+| P3 | Computation latency > 600s p95 | 4 hours | On-call |
+| P4 | Entrypoint detection < 70% | 24 hours | Ticket |
+
+### 7.2 P1 Response Procedure
+
+1. **Acknowledge** alert
+2. **Triage**:
+   ```bash
+   # Check worker health
+   stella scanner workers status
+   
+   # Check graph store connectivity
+   stella health check --service graph-store
+   
+   # Check recent failures
+   stella reachability jobs --status failed --last 10
+   ```
+3. **Mitigate**:
+   ```bash
+   # Scale up workers if queue backlog
+   kubectl scale deployment scanner-worker --replicas=10
+   
+   # Clear stuck jobs
+   stella reachability jobs cancel --status stuck
+   ```
+4. **Communicate**: Update status page
+5. **Resolve**: Fix root cause
+6. **Postmortem**: Document within 48 hours
+
+---
+
+## Related Documentation
+
+- [Reachability API Reference](../api/score-proofs-reachability-api-reference.md)
+- [Scanner Architecture](../modules/scanner/architecture.md)
+- [Call Graph Schema](../schemas/callgraph-v1.md)
+- [Entrypoint Detection](../modules/scanner/operations/entrypoint-problem.md)
+
+---
+
+**Last Updated**: 2025-12-20  
+**Version**: 1.0.0  
+**Sprint**: 3500.0004.0004
--- a/docs/operations/unknowns-queue-runbook.md
+++ b/docs/operations/unknowns-queue-runbook.md
@@ -0,0 +1,590 @@
+# Unknowns Queue Management Runbook
+
+> **Version**: 1.0.0  
+> **Sprint**: 3500.0004.0004  
+> **Last Updated**: 2025-12-20
+
+This runbook covers operational procedures for managing the Unknowns queue, including triage, escalation, resolution, and queue health maintenance.
+
+---
+
+## Table of Contents
+
+1. [Overview](#1-overview)
+2. [Queue Operations](#2-queue-operations)
+3. [Triage Procedures](#3-triage-procedures)
+4. [Escalation Workflows](#4-escalation-workflows)
+5. [Resolution Procedures](#5-resolution-procedures)
+6. [Troubleshooting](#6-troubleshooting)
+7. [Monitoring & Alerting](#7-monitoring--alerting)
+
+---
+
+## 1. Overview
+
+### What are Unknowns?
+
+Unknowns are items that could not be fully classified during scanning due to:
+
+- Missing VEX statements
+- Ambiguous indirect calls in call graphs
+- Incomplete SBOM data
+- Missing advisory information
+- Conflicting evidence from multiple sources
+
+### Unknown Ranking
+
+Unknowns are ranked using a 2-factor scoring model:
+
+```
+score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
+```
+
+| Factor | Weight | Description |
+|--------|--------|-------------|
+| Blast Radius | 0.60 | Impact scope (dependents, network exposure) |
+| Evidence Scarcity | 0.30 | How much data is missing |
+| Exploit Pressure | 0.30 | EPSS score, KEV status |
+| Containment | -0.20 | Mitigation factors (seccomp, read-only FS) |
+
+### Band Assignment
+
+| Band | Score Range | Priority | SLA |
+|------|-------------|----------|-----|
+| HOT | ≥ 0.70 | Critical | 24 hours |
+| WARM | 0.40 - 0.69 | Normal | 7 days |
+| COLD | < 0.40 | Low | 30 days |
+
+---
+
+## 2. Queue Operations
+
+### 2.1 View Queue Status
+
+```bash
+# Get queue summary
+stella unknowns summary
+
+# Output:
+# Total: 142 unknowns
+# HOT:  12 (8%)  - Requires immediate attention
+# WARM: 85 (60%) - Normal priority
+# COLD: 45 (32%) - Low priority
+# 
+# KEV items: 3
+# Average score: 0.52
+
+# Get queue summary via API
+curl "https://scanner.example.com/api/v1/unknowns/summary" \
+  -H "Authorization: Bearer $TOKEN"
+```
+
+### 2.2 List Unknowns
+
+```bash
+# List all HOT unknowns
+stella unknowns list --band HOT
+
+# List by score (highest first)
+stella unknowns list --sort score --order desc --limit 20
+
+# Filter by reason
+stella unknowns list --reason missing_vex
+
+# Filter by artifact
+stella unknowns list --artifact sha256:abc123...
+
+# Filter by KEV status
+stella unknowns list --kev true
+```
+
+### 2.3 View Unknown Details
+
+```bash
+# Get detailed view
+stella unknowns show unk-12345678-abcd-1234-5678-abcdef123456
+
+# Output:
+# ID: unk-12345678-...
+# Artifact: pkg:oci/myapp@sha256:abc123
+# Reasons: [missing_vex, ambiguous_indirect_call]
+# 
+# Blast Radius:
+#   Dependents: 15 services
+#   Network: internet-facing
+#   Privilege: user
+# 
+# Evidence Scarcity: 0.7 (high)
+# 
+# Exploit Pressure:
+#   EPSS: 0.45
+#   KEV: false
+# 
+# Containment:
+#   Seccomp: enforced (-0.10)
+#   Filesystem: read-only (-0.10)
+# 
+# Score: 0.62 (WARM band)
+# Score Breakdown:
+#   Blast component: +0.35
+#   Scarcity component: +0.21
+#   Pressure component: +0.26
+#   Containment deduction: -0.20
+
+# Show proof tree
+stella unknowns proof unk-12345678-...
+```
+
+### 2.4 Export Queue Data
+
+```bash
+# Export for analysis
+stella unknowns export --format json --output unknowns.json
+
+# Export HOT items for daily review
+stella unknowns export --band HOT --format csv --output hot-unknowns.csv
+
+# Export with full details
+stella unknowns export --verbose --include-proofs --output full-export.json
+```
+
+---
+
+## 3. Triage Procedures
+
+### 3.1 Daily Triage Workflow
+
+**Schedule**: Daily at 9:00 AM
+
+**Duration**: 30 minutes
+
+**Participants**: Security analyst, on-call engineer
+
+**Process**:
+
+```bash
+# 1. Get today's queue snapshot
+stella unknowns snapshot --output daily-$(date +%Y%m%d).json
+
+# 2. Review all HOT items
+stella unknowns list --band HOT --since 24h
+
+# 3. For each HOT unknown, determine action:
+#    - Escalate: Trigger immediate rescan
+#    - Investigate: Needs manual analysis
+#    - Defer: Move to WARM (with justification)
+#    - Resolve: Evidence found, can close
+
+# 4. Process each item
+stella unknowns triage unk-12345678-... --action escalate
+stella unknowns triage unk-87654321-... --action investigate --notes "Need VEX from vendor"
+stella unknowns triage unk-11111111-... --action defer --reason "False positive suspected"
+```
+
+### 3.2 Triage Decision Matrix
+
+| Reason Code | KEV | EPSS > 0.5 | Action |
+|-------------|-----|------------|--------|
+| `missing_vex` | Yes | Any | Escalate + Vendor outreach |
+| `missing_vex` | No | Yes | Escalate |
+| `missing_vex` | No | No | Request VEX |
+| `ambiguous_indirect_call` | Any | Any | Manual code review |
+| `incomplete_sbom` | Any | Any | Rescan with updated extractor |
+| `conflicting_evidence` | Any | Any | Manual analysis |
+
+### 3.3 Triage Templates
+
+```bash
+# Quick escalate (HOT + KEV)
+stella unknowns triage unk-... --action escalate \
+  --priority P1 \
+  --notes "KEV item, requires immediate attention"
+
+# Request vendor VEX
+stella unknowns triage unk-... --action investigate \
+  --notes "Requested VEX from vendor via security@vendor.com" \
+  --due-date 7d
+
+# Mark for code review
+stella unknowns triage unk-... --action investigate \
+  --notes "Requires manual code review to resolve indirect call" \
+  --assign @code-review-team
+
+# Defer with justification
+stella unknowns triage unk-... --action defer \
+  --reason "Component not deployed to production" \
+  --evidence "deployment-manifest.yaml shows staging-only"
+```
+
+---
+
+## 4. Escalation Workflows
+
+### 4.1 Automatic Escalation
+
+Unknowns are automatically escalated when:
+
+- Score increases above HOT threshold (0.70)
+- KEV status added to related CVE
+- EPSS score increases significantly (> 0.2 delta)
+- Blast radius increases (new dependents detected)
+
+**Configure auto-escalation**:
+
+```yaml
+# policy.unknowns.escalation.yaml
+autoEscalation:
+  enabled: true
+  triggers:
+    - condition: score >= 0.70
+      action: escalate
+      notify: [security-team]
+    - condition: kev == true
+      action: escalate
+      priority: P1
+      notify: [security-team, management]
+    - condition: epss_delta > 0.2
+      action: escalate
+      notify: [security-team]
+```
+
+### 4.2 Manual Escalation
+
+```bash
+# Escalate via CLI
+stella unknowns escalate unk-12345678-...
+
+# Escalate with reason
+stella unknowns escalate unk-12345678-... \
+  --reason "Customer reported potential exploit"
+
+# Escalate to trigger rescan
+stella unknowns escalate unk-12345678-... --rescan
+
+# Output:
+# Escalated: unk-12345678-...
+# Rescan job: rescan-job-001
+# Status: queued
+# ETA: 5 minutes
+```
+
+### 4.3 Bulk Escalation
+
+```bash
+# Escalate all KEV items
+stella unknowns escalate --filter "kev=true" --reason "KEV bulk escalation"
+
+# Escalate high-score items
+stella unknowns escalate --filter "score>=0.8" --rescan
+
+# Escalate by artifact
+stella unknowns escalate --artifact sha256:abc123... --reason "Production incident"
+```
+
+### 4.4 Escalation SLA Tracking
+
+```bash
+# Check SLA status
+stella unknowns sla-status
+
+# Output:
+# HOT unknowns SLA (24h):
+#   In SLA: 10 (83%)
+#   Breached: 2 (17%)
+#   
+# Breached items:
+#   unk-111... (26h old) - missing_vex
+#   unk-222... (30h old) - conflicting_evidence
+
+# Get SLA breach notifications
+stella unknowns list --sla-breached
+```
+
+---
+
+## 5. Resolution Procedures
+
+### 5.1 Resolution Types
+
+| Resolution | Description | Evidence Required |
+|------------|-------------|-------------------|
+| `not_affected` | Vulnerability doesn't apply | VEX statement or manual analysis |
+| `fixed` | Vulnerability patched | Version upgrade confirmation |
+| `mitigated` | Controls in place | Mitigation documentation |
+| `false_positive` | Incorrect classification | Analysis report |
+| `wont_fix` | Accepted risk | Risk acceptance form |
+
+### 5.2 Resolve Unknown
+
+```bash
+# Resolve as not affected
+stella unknowns resolve unk-12345678-... \
+  --resolution not_affected \
+  --justification "vulnerable_code_not_present" \
+  --notes "Manual code review confirmed function not used"
+
+# Resolve as fixed
+stella unknowns resolve unk-12345678-... \
+  --resolution fixed \
+  --justification "version_upgraded" \
+  --evidence "Upgraded lodash to 4.17.21, CVE patched"
+
+# Resolve as mitigated
+stella unknowns resolve unk-12345678-... \
+  --resolution mitigated \
+  --justification "inline_mitigations_exist" \
+  --evidence "WAF rule WAF-001 blocks exploit pattern"
+
+# Resolve as won't fix (risk accepted)
+stella unknowns resolve unk-12345678-... \
+  --resolution wont_fix \
+  --justification "risk_accepted" \
+  --evidence "Risk acceptance ticket RISK-123" \
+  --expires 90d  # Re-evaluate in 90 days
+```
+
+### 5.3 Bulk Resolution
+
+```bash
+# Resolve all items for a fixed package version
+stella unknowns resolve-batch \
+  --filter "purl=pkg:npm/lodash@4.17.20" \
+  --resolution fixed \
+  --justification "Upgraded to 4.17.21 fleet-wide" \
+  --evidence "Fleet upgrade ticket FLEET-456"
+
+# Resolve false positives from analysis
+stella unknowns resolve-batch \
+  --file false-positives.json \
+  --resolution false_positive
+```
+
+### 5.4 Resolution Audit Trail
+
+```bash
+# View resolution history
+stella unknowns history unk-12345678-...
+
+# Output:
+# 2025-12-15 10:00:00 - Created (score: 0.62)
+# 2025-12-16 09:30:00 - Triaged by analyst@example.com
+# 2025-12-17 14:00:00 - Escalated (KEV added)
+# 2025-12-18 11:00:00 - Resolved by security@example.com
+#   Resolution: not_affected
+#   Justification: vulnerable_code_not_present
+#   Notes: Manual code review confirmed function not used
+
+# Export audit trail
+stella unknowns audit-export --from 2025-01-01 --to 2025-12-31 --output audit.json
+```
+
+---
+
+## 6. Troubleshooting
+
+### 6.1 Score Seems Wrong
+
+**Symptom**: Unknown scored too high or too low.
+
+**Diagnosis**:
+
+```bash
+# View score breakdown
+stella unknowns show unk-... --score-details
+
+# View proof tree
+stella unknowns proof unk-... --verbose
+```
+
+**Common causes**:
+
+1. **Stale EPSS data**: EPSS feed not updated
+2. **Incorrect blast radius**: Dependency data outdated
+3. **Missing containment data**: Seccomp/filesystem status unknown
+
+**Resolution**:
+
+```bash
+# Trigger score recalculation
+stella unknowns recalculate unk-...
+
+# Force refresh of all input signals
+stella unknowns refresh unk-... --force
+```
+
+### 6.2 Duplicate Unknowns
+
+**Symptom**: Same issue appears multiple times.
+
+**Diagnosis**:
+
+```bash
+# Find potential duplicates
+stella unknowns duplicates --scan
+
+# Output shows items with same CVE+PURL but different artifacts
+```
+
+**Resolution**:
+
+```bash
+# Merge duplicates
+stella unknowns merge \
+  --primary unk-111... \
+  --secondary unk-222... \
+  --reason "Same CVE across artifact versions"
+```
+
+### 6.3 Escalation Not Working
+
+**Symptom**: Escalation doesn't trigger rescan.
+
+**Diagnosis**:
+
+```bash
+# Check escalation status
+stella unknowns escalation-status unk-...
+
+# Check Scheduler connectivity
+stella health check --service scheduler
+
+# Check job queue
+stella scheduler queue status rescan
+```
+
+**Resolution**:
+
+```bash
+# Retry escalation
+stella unknowns escalate unk-... --force
+
+# Manual rescan trigger
+stella scan trigger --artifact sha256:abc123... --priority high
+```
+
+### 6.4 Resolution Rejected
+
+**Symptom**: Resolution attempt fails validation.
+
+**Diagnosis**:
+
+```bash
+# Check resolution requirements
+stella unknowns resolution-requirements unk-...
+
+# Output:
+# Resolution requirements for unk-12345678-...
+# - Justification: required
+# - Evidence: required (reason: KEV item)
+# - Approver: required (band: HOT)
+```
+
+**Resolution**:
+
+```bash
+# Provide required evidence
+stella unknowns resolve unk-... \
+  --resolution not_affected \
+  --justification "vulnerable_code_not_present" \
+  --evidence "Code review: CRV-123" \
+  --approver security-lead@example.com
+```
+
+---
+
+## 7. Monitoring & Alerting
+
+### 7.1 Key Metrics
+
+| Metric | Description | Alert Threshold |
+|--------|-------------|-----------------|
+| `unknowns_total` | Total unknowns in queue | > 500 |
+| `unknowns_hot_count` | HOT band count | > 20 |
+| `unknowns_sla_breached` | SLA breaches | > 0 |
+| `unknowns_resolution_rate` | Daily resolutions | < 5 |
+| `unknowns_escalation_failures` | Failed escalations | > 0 |
+| `unknowns_avg_age_hours` | Average unknown age | > 168 (1 week) |
+
+### 7.2 Grafana Dashboard
+
+```
+Dashboard: Unknowns Queue Health
+Panels:
+- Queue size by band (HOT/WARM/COLD)
+- SLA compliance rate
+- Unknowns by reason code
+- Resolution velocity
+- Escalation success rate
+- Queue age distribution
+- KEV item tracking
+```
+
+### 7.3 Alerting Rules
+
+```yaml
+groups:
+  - name: unknowns-queue
+    rules:
+      - alert: UnknownsHotBandHigh
+        expr: unknowns_hot_count > 20
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "HOT unknowns queue is high ({{ $value }} items)"
+          
+      - alert: UnknownsSLABreach
+        expr: unknowns_sla_breached > 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "{{ $value }} unknowns have breached SLA"
+          
+      - alert: UnknownsQueueGrowing
+        expr: rate(unknowns_total[1h]) > 10
+        for: 30m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Unknowns queue is growing rapidly"
+          
+      - alert: UnknownsKEVPending
+        expr: unknowns_kev_count > 0 and unknowns_kev_unresolved_age_hours > 24
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "KEV unknown pending for over 24 hours"
+```
+
+### 7.4 Daily Report
+
+```bash
+# Generate daily report
+stella unknowns report --format email --send-to security-team@example.com
+
+# Report includes:
+# - Queue summary (total, by band, by reason)
+# - SLA status (in compliance, breaches)
+# - Top 10 highest-scored items
+# - Newly added items (last 24h)
+# - Resolved items (last 24h)
+# - KEV item status
+# - Trends (7-day, 30-day)
+```
+
+---
+
+## Related Documentation
+
+- [Unknowns API Reference](../api/score-proofs-reachability-api-reference.md#5-unknowns-api)
+- [Triage Technical Reference](../product-advisories/14-Dec-2025%20-%20Triage%20and%20Unknowns%20Technical%20Reference.md)
+- [Score Proofs Runbook](./score-proofs-runbook.md)
+- [Policy Engine](../modules/policy/architecture.md)
+
+---
+
+**Last Updated**: 2025-12-20  
+**Version**: 1.0.0  
+**Sprint**: 3500.0004.0004