feat: Add operations runbooks and UI API models for Sprint 3500.0004.x

Operations documentation: - docs/operations/reachability-runbook.md - Reachability troubleshooting guide - docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide UI TypeScript models: - src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types - src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types - src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
2025-12-20 22:22:09 +02:00
parent efe9bd8cfe
commit da315965ff
5 changed files with 1719 additions and 0 deletions
--- a/docs/operations/reachability-runbook.md
+++ b/docs/operations/reachability-runbook.md
@@ -0,0 +1,594 @@
+# Reachability Analysis Operations Runbook
+
+> **Version**: 1.0.0  
+> **Sprint**: 3500.0004.0004  
+> **Last Updated**: 2025-12-20
+
+This runbook covers operational procedures for Reachability Analysis, including call graph management, analysis troubleshooting, and explain queries.
+
+---
+
+## Table of Contents
+
+1. [Overview](#1-overview)
+2. [Call Graph Operations](#2-call-graph-operations)
+3. [Reachability Computation](#3-reachability-computation)
+4. [Explain Queries](#4-explain-queries)
+5. [Troubleshooting](#5-troubleshooting)
+6. [Monitoring & Alerting](#6-monitoring--alerting)
+7. [Escalation Procedures](#7-escalation-procedures)
+
+---
+
+## 1. Overview
+
+### What is Reachability Analysis?
+
+Reachability Analysis determines whether vulnerable code is actually reachable from application entrypoints. This reduces false positives by filtering out vulnerabilities in code that cannot be executed.
+
+### Reachability Statuses
+
+| Status | Confidence | Description |
+|--------|------------|-------------|
+| `UNREACHABLE` | High | No path from entrypoints to vulnerable code |
+| `POSSIBLY_REACHABLE` | Medium | Path exists but contains heuristic edges |
+| `REACHABLE_STATIC` | High | Static analysis proves path exists |
+| `REACHABLE_PROVEN` | Very High | Runtime evidence confirms execution |
+| `UNKNOWN` | Low | Insufficient data to determine |
+
+### Key Components
+
+| Component | Purpose | Location |
+|-----------|---------|----------|
+| Call Graph Extractor | Language-specific CG extraction | Scanner Worker plugins |
+| Call Graph Store | Persistent graph storage | `scanner.cg_node`, `scanner.cg_edge` |
+| Reachability Analyzer | BFS pathfinding algorithm | Scanner Core library |
+| Entrypoint Detector | Identifies application entrypoints | Language-specific plugins |
+
+### Prerequisites
+
+- Access to Scanner WebService API
+- `scanner.reachability` OAuth scope
+- CLI access with `stella` configured
+- Language-specific workers deployed (dotnet, java, etc.)
+
+---
+
+## 2. Call Graph Operations
+
+### 2.1 Call Graph Upload
+
+```bash
+# Upload via API
+curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/callgraphs" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -H "Content-Digest: sha256=$(sha256sum callgraph.json | cut -d' ' -f1)" \
+  -d @callgraph.json
+
+# Upload via CLI
+stella scan graph upload --scan-id $SCAN_ID --file callgraph.json
+
+# Upload streaming NDJSON (for large graphs)
+stella scan graph upload --scan-id $SCAN_ID \
+  --file callgraph.ndjson \
+  --format ndjson \
+  --streaming
+```
+
+### 2.2 Call Graph Inspection
+
+```bash
+# Get call graph summary
+stella scan graph summary --scan-id $SCAN_ID
+
+# Output:
+# Nodes: 12,345
+# Edges: 56,789
+# Entrypoints: 42
+# Languages: [dotnet, java]
+# Size: 15.2 MB
+
+# List entrypoints
+stella scan graph entrypoints --scan-id $SCAN_ID
+
+# Export full graph (for debugging)
+stella scan graph export --scan-id $SCAN_ID --output graph.json
+
+# Visualize subgraph (requires GraphViz)
+stella scan graph visualize --scan-id $SCAN_ID \
+  --node sha256:node123... \
+  --depth 3 \
+  --output subgraph.svg
+```
+
+### 2.3 Call Graph Validation
+
+```bash
+# Validate graph structure
+stella scan graph validate --scan-id $SCAN_ID
+
+# Checks performed:
+# - All edge targets exist as nodes
+# - Entrypoints reference valid nodes
+# - No orphan nodes
+# - No cycles in entrypoint definitions
+# - Schema compliance
+
+# Validate before upload
+stella scan graph validate --file callgraph.json --strict
+```
+
+### 2.4 Call Graph Merging
+
+When multiple language workers produce graphs:
+
+```bash
+# View merge status
+stella scan graph merges --scan-id $SCAN_ID
+
+# Output:
+# Language   | Nodes  | Edges  | Status
+# dotnet     | 8,234  | 34,567 | merged
+# java       | 4,111  | 22,222 | merged
+# Total      | 12,345 | 56,789 | complete
+
+# Force re-merge (after fix)
+stella scan graph merge --scan-id $SCAN_ID --force
+```
+
+---
+
+## 3. Reachability Computation
+
+### 3.1 Triggering Computation
+
+```bash
+# Trigger via API
+curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/compute" \
+  -H "Authorization: Bearer $TOKEN"
+
+# Trigger via CLI
+stella reachability compute --scan-id $SCAN_ID
+
+# Trigger with options
+stella reachability compute --scan-id $SCAN_ID \
+  --max-depth 20 \
+  --indirect-resolution conservative \
+  --timeout 300s
+```
+
+### 3.2 Computation Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `max-depth` | 10 | Maximum path length to explore |
+| `indirect-resolution` | `conservative` | How to handle indirect calls: `conservative`, `aggressive`, `skip` |
+| `timeout` | 300s | Maximum computation time |
+| `parallel` | true | Parallel BFS from multiple entrypoints |
+| `include-runtime` | true | Merge runtime evidence if available |
+
+### 3.3 Job Monitoring
+
+```bash
+# Check job status
+stella reachability job-status --job-id reachability-job-001
+
+# Output:
+# Status: running
+# Progress: 67% (8,234 / 12,345 nodes visited)
+# Started: 2025-12-20T10:00:00Z
+# Estimated completion: 2025-12-20T10:02:30Z
+
+# Stream job logs
+stella reachability job-logs --job-id reachability-job-001 --follow
+
+# Cancel running job
+stella reachability job-cancel --job-id reachability-job-001
+```
+
+### 3.4 Computation Results
+
+```bash
+# Get summary
+stella reachability summary --scan-id $SCAN_ID
+
+# Output:
+# Total vulnerabilities: 45
+# Unreachable: 38 (84%)
+# Possibly reachable: 4 (9%)
+# Reachable (static): 2 (4%)
+# Reachable (proven): 1 (2%)
+# Unknown: 0 (0%)
+
+# Get detailed findings
+stella reachability findings --scan-id $SCAN_ID --format json
+
+# Filter by status
+stella reachability findings --scan-id $SCAN_ID --status REACHABLE_STATIC
+
+# Export for CI gate
+stella reachability findings --scan-id $SCAN_ID \
+  --status REACHABLE_STATIC,REACHABLE_PROVEN \
+  --format sarif \
+  --output findings.sarif
+```
+
+---
+
+## 4. Explain Queries
+
+### 4.1 Explain Single Finding
+
+```bash
+# Via API
+curl "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/explain?cve=CVE-2024-1234&purl=pkg:npm/lodash@4.17.20" \
+  -H "Authorization: Bearer $TOKEN"
+
+# Via CLI
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --purl "pkg:npm/lodash@4.17.20"
+
+# Output:
+# Status: REACHABLE_STATIC
+# Confidence: 0.70
+# 
+# Shortest Path (depth=3):
+# [0] MyApp.Controllers.OrdersController::Get(Guid)
+#     Entrypoint: HTTP GET /api/orders/{id}
+# [1] MyApp.Services.OrderService::Process(Order)
+#     Edge: static (direct_call)
+# [2] Lodash.merge(Object, Object) [VULNERABLE]
+#     Edge: static (direct_call)
+# 
+# Why Reachable:
+# - Static call path exists from HTTP entrypoint
+# - All edges are statically proven
+# - Vulnerable function is directly invoked
+```
+
+### 4.2 Explain with Alternatives
+
+```bash
+# Show all paths (not just shortest)
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --purl "pkg:npm/lodash@4.17.20" \
+  --all-paths
+
+# Output includes:
+# Alternative paths found: 3
+# Path 1 (depth=3): ... [shown above]
+# Path 2 (depth=5): Controllers.UserController -> ... -> Lodash.merge
+# Path 3 (depth=7): Background.JobProcessor -> ... -> Lodash.merge
+```
+
+### 4.3 Why Unreachable
+
+```bash
+# Explain why vulnerability is unreachable
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-5678 \
+  --purl "pkg:npm/unused-lib@1.0.0"
+
+# Output:
+# Status: UNREACHABLE
+# Confidence: 0.95
+# 
+# Why Unreachable:
+# - No path found from any entrypoint
+# - Vulnerable function: UnusedLib.dangerousMethod()
+# - Function visibility: private
+# - Callers found: 0
+# - Dead code analysis: likely dead code
+```
+
+### 4.4 Batch Explain
+
+```bash
+# Export all reachability explanations
+stella reachability explain-all --scan-id $SCAN_ID \
+  --output explanations.json
+
+# Explain only reachable findings
+stella reachability explain-all --scan-id $SCAN_ID \
+  --status REACHABLE_STATIC,REACHABLE_PROVEN \
+  --output reachable-explanations.json
+```
+
+---
+
+## 5. Troubleshooting
+
+### 5.1 Call Graph Too Large
+
+**Symptom**: Upload fails with "413 Payload Too Large".
+
+**Diagnosis**:
+
+```bash
+# Check graph size
+du -h callgraph.json
+wc -l callgraph.json
+
+# Count nodes/edges
+jq '.nodes | length' callgraph.json
+jq '.edges | length' callgraph.json
+```
+
+**Resolution**:
+
+```bash
+# Option 1: Use streaming upload
+stella scan graph upload --scan-id $SCAN_ID \
+  --file callgraph.json \
+  --streaming
+
+# Option 2: Convert to NDJSON
+stella scan graph convert --input callgraph.json \
+  --output callgraph.ndjson \
+  --format ndjson
+
+# Option 3: Partition by artifact
+stella scan graph partition --input callgraph.json \
+  --output-dir ./partitions/ \
+  --by artifact
+```
+
+### 5.2 Missing Entrypoints
+
+**Symptom**: "No entrypoints found" warning.
+
+**Diagnosis**:
+
+```bash
+# Check entrypoint detection
+stella scan graph entrypoints --scan-id $SCAN_ID --verbose
+
+# Check for framework detection
+stella scan graph detect-framework --scan-id $SCAN_ID
+```
+
+**Common causes**:
+
+1. **Framework not detected**: Add framework hints
+2. **Custom entrypoints**: Manually specify
+3. **Wrong language worker**: Check artifact analysis
+
+**Resolution**:
+
+```bash
+# Specify framework explicitly
+stella scan graph upload --scan-id $SCAN_ID \
+  --file callgraph.json \
+  --framework aspnetcore
+
+# Add custom entrypoints
+stella scan graph entrypoint add --scan-id $SCAN_ID \
+  --node sha256:node123... \
+  --kind http \
+  --route "/api/custom"
+```
+
+### 5.3 Reachability Computation Timeout
+
+**Symptom**: Job fails with "computation timeout".
+
+**Diagnosis**:
+
+```bash
+# Check computation stats
+stella reachability job-stats --job-id reachability-job-001
+
+# Output:
+# Nodes visited: 500,000
+# Edges traversed: 2,500,000
+# Time elapsed: 300s
+# Memory used: 4.2 GB
+```
+
+**Resolution**:
+
+```bash
+# Option 1: Increase timeout
+stella reachability compute --scan-id $SCAN_ID --timeout 600s
+
+# Option 2: Reduce depth
+stella reachability compute --scan-id $SCAN_ID --max-depth 5
+
+# Option 3: Skip indirect calls
+stella reachability compute --scan-id $SCAN_ID --indirect-resolution skip
+
+# Option 4: Partition analysis
+stella reachability compute --scan-id $SCAN_ID --partition-by artifact
+```
+
+### 5.4 Inconsistent Results
+
+**Symptom**: Different results between runs.
+
+**Diagnosis**:
+
+```bash
+# Check determinism settings
+stella scan manifest --scan-id $SCAN_ID | jq '.deterministic, .seed'
+
+# Compare graph hashes
+stella scan graph hash --scan-id $SCAN_ID
+```
+
+**Resolution**:
+
+```bash
+# Ensure deterministic mode
+stella reachability compute --scan-id $SCAN_ID \
+  --deterministic \
+  --seed "AQIDBA=="  # Fixed seed
+
+# Use same graph version
+stella reachability compute --scan-id $SCAN_ID \
+  --graph-digest sha256:cg123...
+```
+
+### 5.5 False Positives/Negatives
+
+**Symptom**: Reachability verdict seems incorrect.
+
+**Diagnosis**:
+
+```bash
+# Get detailed explanation
+stella reachability explain --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --purl "pkg:npm/lodash@4.17.20" \
+  --verbose
+
+# Check edge confidence
+stella scan graph edge --scan-id $SCAN_ID \
+  --from sha256:nodeA... \
+  --to sha256:nodeB...
+```
+
+**Common causes for false positives**:
+
+1. **Heuristic edges**: Indirect call resolution too aggressive
+2. **Reflection/dynamic calls**: May create false paths
+3. **Dead code not detected**: Code exists but never executes
+
+**Common causes for false negatives**:
+
+1. **Missing edges**: Call graph incomplete
+2. **Indirect calls skipped**: Resolution too conservative
+3. **Cross-language calls**: Language boundary not bridged
+
+**Resolution**:
+
+```bash
+# Adjust indirect call resolution
+stella reachability compute --scan-id $SCAN_ID \
+  --indirect-resolution conservative
+
+# Add runtime evidence
+stella scan evidence upload --scan-id $SCAN_ID \
+  --file runtime-trace.json
+
+# Report false positive/negative for ML training
+stella reachability feedback --scan-id $SCAN_ID \
+  --cve CVE-2024-1234 \
+  --verdict false-positive \
+  --reason "Dead code - feature flag disabled"
+```
+
+---
+
+## 6. Monitoring & Alerting
+
+### 6.1 Key Metrics
+
+| Metric | Description | Alert Threshold |
+|--------|-------------|-----------------|
+| `callgraph_upload_duration_seconds` | Time to upload call graph | > 60s |
+| `callgraph_size_bytes` | Size of uploaded graphs | > 200MB |
+| `reachability_computation_duration_seconds` | Time to compute reachability | > 300s |
+| `reachability_nodes_visited` | Nodes visited during BFS | > 1M |
+| `reachability_job_failures_total` | Failed computation jobs | > 0/hour |
+| `entrypoint_detection_rate` | % of scans with entrypoints | < 90% |
+
+### 6.2 Grafana Dashboard
+
+```
+Dashboard: Reachability Operations
+Panels:
+- Call graph upload throughput
+- Graph size distribution
+- Computation duration (p50, p95, p99)
+- Reachability verdict distribution
+- Job queue depth
+- Entrypoint detection rate
+```
+
+### 6.3 Alerting Rules
+
+```yaml
+groups:
+  - name: reachability
+    rules:
+      - alert: ReachabilityComputationSlow
+        expr: histogram_quantile(0.95, reachability_computation_duration_seconds) > 300
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Reachability computation is slow"
+          
+      - alert: ReachabilityJobFailures
+        expr: increase(reachability_job_failures_total[1h]) > 5
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Multiple reachability job failures"
+          
+      - alert: LowEntrypointDetectionRate
+        expr: entrypoint_detection_rate < 0.8
+        for: 1h
+        labels:
+          severity: warning
+        annotations:
+          summary: "Entrypoint detection rate is low"
+```
+
+---
+
+## 7. Escalation Procedures
+
+### 7.1 Escalation Matrix
+
+| Severity | Condition | Response Time | Escalation Path |
+|----------|-----------|---------------|-----------------|
+| P1 | Reachability failing for all scans | 15 min | On-call → Team Lead |
+| P2 | Computation failures > 20% | 1 hour | On-call → Team Lead |
+| P3 | Computation latency > 600s p95 | 4 hours | On-call |
+| P4 | Entrypoint detection < 70% | 24 hours | Ticket |
+
+### 7.2 P1 Response Procedure
+
+1. **Acknowledge** alert
+2. **Triage**:
+   ```bash
+   # Check worker health
+   stella scanner workers status
+   
+   # Check graph store connectivity
+   stella health check --service graph-store
+   
+   # Check recent failures
+   stella reachability jobs --status failed --last 10
+   ```
+3. **Mitigate**:
+   ```bash
+   # Scale up workers if queue backlog
+   kubectl scale deployment scanner-worker --replicas=10
+   
+   # Clear stuck jobs
+   stella reachability jobs cancel --status stuck
+   ```
+4. **Communicate**: Update status page
+5. **Resolve**: Fix root cause
+6. **Postmortem**: Document within 48 hours
+
+---
+
+## Related Documentation
+
+- [Reachability API Reference](../api/score-proofs-reachability-api-reference.md)
+- [Scanner Architecture](../modules/scanner/architecture.md)
+- [Call Graph Schema](../schemas/callgraph-v1.md)
+- [Entrypoint Detection](../modules/scanner/operations/entrypoint-problem.md)
+
+---
+
+**Last Updated**: 2025-12-20  
+**Version**: 1.0.0  
+**Sprint**: 3500.0004.0004