Operations documentation: - docs/operations/reachability-runbook.md - Reachability troubleshooting guide - docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide UI TypeScript models: - src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types - src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types - src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
595 lines
15 KiB
Markdown
595 lines
15 KiB
Markdown
# Reachability Analysis Operations Runbook
|
|
|
|
> **Version**: 1.0.0
|
|
> **Sprint**: 3500.0004.0004
|
|
> **Last Updated**: 2025-12-20
|
|
|
|
This runbook covers operational procedures for Reachability Analysis, including call graph management, analysis troubleshooting, and explain queries.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#1-overview)
|
|
2. [Call Graph Operations](#2-call-graph-operations)
|
|
3. [Reachability Computation](#3-reachability-computation)
|
|
4. [Explain Queries](#4-explain-queries)
|
|
5. [Troubleshooting](#5-troubleshooting)
|
|
6. [Monitoring & Alerting](#6-monitoring--alerting)
|
|
7. [Escalation Procedures](#7-escalation-procedures)
|
|
|
|
---
|
|
|
|
## 1. Overview
|
|
|
|
### What is Reachability Analysis?
|
|
|
|
Reachability Analysis determines whether vulnerable code is actually reachable from application entrypoints. This reduces false positives by filtering out vulnerabilities in code that cannot be executed.
|
|
|
|
### Reachability Statuses
|
|
|
|
| Status | Confidence | Description |
|
|
|--------|------------|-------------|
|
|
| `UNREACHABLE` | High | No path from entrypoints to vulnerable code |
|
|
| `POSSIBLY_REACHABLE` | Medium | Path exists but contains heuristic edges |
|
|
| `REACHABLE_STATIC` | High | Static analysis proves path exists |
|
|
| `REACHABLE_PROVEN` | Very High | Runtime evidence confirms execution |
|
|
| `UNKNOWN` | Low | Insufficient data to determine |
|
|
|
|
### Key Components
|
|
|
|
| Component | Purpose | Location |
|
|
|-----------|---------|----------|
|
|
| Call Graph Extractor | Language-specific CG extraction | Scanner Worker plugins |
|
|
| Call Graph Store | Persistent graph storage | `scanner.cg_node`, `scanner.cg_edge` |
|
|
| Reachability Analyzer | BFS pathfinding algorithm | Scanner Core library |
|
|
| Entrypoint Detector | Identifies application entrypoints | Language-specific plugins |
|
|
|
|
### Prerequisites
|
|
|
|
- Access to Scanner WebService API
|
|
- `scanner.reachability` OAuth scope
|
|
- CLI access with `stella` configured
|
|
- Language-specific workers deployed (dotnet, java, etc.)
|
|
|
|
---
|
|
|
|
## 2. Call Graph Operations
|
|
|
|
### 2.1 Call Graph Upload
|
|
|
|
```bash
|
|
# Upload via API
|
|
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/callgraphs" \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Content-Digest: sha256=$(sha256sum callgraph.json | cut -d' ' -f1)" \
|
|
-d @callgraph.json
|
|
|
|
# Upload via CLI
|
|
stella scan graph upload --scan-id $SCAN_ID --file callgraph.json
|
|
|
|
# Upload streaming NDJSON (for large graphs)
|
|
stella scan graph upload --scan-id $SCAN_ID \
|
|
--file callgraph.ndjson \
|
|
--format ndjson \
|
|
--streaming
|
|
```
|
|
|
|
### 2.2 Call Graph Inspection
|
|
|
|
```bash
|
|
# Get call graph summary
|
|
stella scan graph summary --scan-id $SCAN_ID
|
|
|
|
# Output:
|
|
# Nodes: 12,345
|
|
# Edges: 56,789
|
|
# Entrypoints: 42
|
|
# Languages: [dotnet, java]
|
|
# Size: 15.2 MB
|
|
|
|
# List entrypoints
|
|
stella scan graph entrypoints --scan-id $SCAN_ID
|
|
|
|
# Export full graph (for debugging)
|
|
stella scan graph export --scan-id $SCAN_ID --output graph.json
|
|
|
|
# Visualize subgraph (requires GraphViz)
|
|
stella scan graph visualize --scan-id $SCAN_ID \
|
|
--node sha256:node123... \
|
|
--depth 3 \
|
|
--output subgraph.svg
|
|
```
|
|
|
|
### 2.3 Call Graph Validation
|
|
|
|
```bash
|
|
# Validate graph structure
|
|
stella scan graph validate --scan-id $SCAN_ID
|
|
|
|
# Checks performed:
|
|
# - All edge targets exist as nodes
|
|
# - Entrypoints reference valid nodes
|
|
# - No orphan nodes
|
|
# - No cycles in entrypoint definitions
|
|
# - Schema compliance
|
|
|
|
# Validate before upload
|
|
stella scan graph validate --file callgraph.json --strict
|
|
```
|
|
|
|
### 2.4 Call Graph Merging
|
|
|
|
When multiple language workers produce graphs:
|
|
|
|
```bash
|
|
# View merge status
|
|
stella scan graph merges --scan-id $SCAN_ID
|
|
|
|
# Output:
|
|
# Language | Nodes | Edges | Status
|
|
# dotnet | 8,234 | 34,567 | merged
|
|
# java | 4,111 | 22,222 | merged
|
|
# Total | 12,345 | 56,789 | complete
|
|
|
|
# Force re-merge (after fix)
|
|
stella scan graph merge --scan-id $SCAN_ID --force
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Reachability Computation
|
|
|
|
### 3.1 Triggering Computation
|
|
|
|
```bash
|
|
# Trigger via API
|
|
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/compute" \
|
|
-H "Authorization: Bearer $TOKEN"
|
|
|
|
# Trigger via CLI
|
|
stella reachability compute --scan-id $SCAN_ID
|
|
|
|
# Trigger with options
|
|
stella reachability compute --scan-id $SCAN_ID \
|
|
--max-depth 20 \
|
|
--indirect-resolution conservative \
|
|
--timeout 300s
|
|
```
|
|
|
|
### 3.2 Computation Options
|
|
|
|
| Option | Default | Description |
|
|
|--------|---------|-------------|
|
|
| `max-depth` | 10 | Maximum path length to explore |
|
|
| `indirect-resolution` | `conservative` | How to handle indirect calls: `conservative`, `aggressive`, `skip` |
|
|
| `timeout` | 300s | Maximum computation time |
|
|
| `parallel` | true | Parallel BFS from multiple entrypoints |
|
|
| `include-runtime` | true | Merge runtime evidence if available |
|
|
|
|
### 3.3 Job Monitoring
|
|
|
|
```bash
|
|
# Check job status
|
|
stella reachability job-status --job-id reachability-job-001
|
|
|
|
# Output:
|
|
# Status: running
|
|
# Progress: 67% (8,234 / 12,345 nodes visited)
|
|
# Started: 2025-12-20T10:00:00Z
|
|
# Estimated completion: 2025-12-20T10:02:30Z
|
|
|
|
# Stream job logs
|
|
stella reachability job-logs --job-id reachability-job-001 --follow
|
|
|
|
# Cancel running job
|
|
stella reachability job-cancel --job-id reachability-job-001
|
|
```
|
|
|
|
### 3.4 Computation Results
|
|
|
|
```bash
|
|
# Get summary
|
|
stella reachability summary --scan-id $SCAN_ID
|
|
|
|
# Output:
|
|
# Total vulnerabilities: 45
|
|
# Unreachable: 38 (84%)
|
|
# Possibly reachable: 4 (9%)
|
|
# Reachable (static): 2 (4%)
|
|
# Reachable (proven): 1 (2%)
|
|
# Unknown: 0 (0%)
|
|
|
|
# Get detailed findings
|
|
stella reachability findings --scan-id $SCAN_ID --format json
|
|
|
|
# Filter by status
|
|
stella reachability findings --scan-id $SCAN_ID --status REACHABLE_STATIC
|
|
|
|
# Export for CI gate
|
|
stella reachability findings --scan-id $SCAN_ID \
|
|
--status REACHABLE_STATIC,REACHABLE_PROVEN \
|
|
--format sarif \
|
|
--output findings.sarif
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Explain Queries
|
|
|
|
### 4.1 Explain Single Finding
|
|
|
|
```bash
|
|
# Via API
|
|
curl "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/explain?cve=CVE-2024-1234&purl=pkg:npm/lodash@4.17.20" \
|
|
-H "Authorization: Bearer $TOKEN"
|
|
|
|
# Via CLI
|
|
stella reachability explain --scan-id $SCAN_ID \
|
|
--cve CVE-2024-1234 \
|
|
--purl "pkg:npm/lodash@4.17.20"
|
|
|
|
# Output:
|
|
# Status: REACHABLE_STATIC
|
|
# Confidence: 0.70
|
|
#
|
|
# Shortest Path (depth=3):
|
|
# [0] MyApp.Controllers.OrdersController::Get(Guid)
|
|
# Entrypoint: HTTP GET /api/orders/{id}
|
|
# [1] MyApp.Services.OrderService::Process(Order)
|
|
# Edge: static (direct_call)
|
|
# [2] Lodash.merge(Object, Object) [VULNERABLE]
|
|
# Edge: static (direct_call)
|
|
#
|
|
# Why Reachable:
|
|
# - Static call path exists from HTTP entrypoint
|
|
# - All edges are statically proven
|
|
# - Vulnerable function is directly invoked
|
|
```
|
|
|
|
### 4.2 Explain with Alternatives
|
|
|
|
```bash
|
|
# Show all paths (not just shortest)
|
|
stella reachability explain --scan-id $SCAN_ID \
|
|
--cve CVE-2024-1234 \
|
|
--purl "pkg:npm/lodash@4.17.20" \
|
|
--all-paths
|
|
|
|
# Output includes:
|
|
# Alternative paths found: 3
|
|
# Path 1 (depth=3): ... [shown above]
|
|
# Path 2 (depth=5): Controllers.UserController -> ... -> Lodash.merge
|
|
# Path 3 (depth=7): Background.JobProcessor -> ... -> Lodash.merge
|
|
```
|
|
|
|
### 4.3 Why Unreachable
|
|
|
|
```bash
|
|
# Explain why vulnerability is unreachable
|
|
stella reachability explain --scan-id $SCAN_ID \
|
|
--cve CVE-2024-5678 \
|
|
--purl "pkg:npm/unused-lib@1.0.0"
|
|
|
|
# Output:
|
|
# Status: UNREACHABLE
|
|
# Confidence: 0.95
|
|
#
|
|
# Why Unreachable:
|
|
# - No path found from any entrypoint
|
|
# - Vulnerable function: UnusedLib.dangerousMethod()
|
|
# - Function visibility: private
|
|
# - Callers found: 0
|
|
# - Dead code analysis: likely dead code
|
|
```
|
|
|
|
### 4.4 Batch Explain
|
|
|
|
```bash
|
|
# Export all reachability explanations
|
|
stella reachability explain-all --scan-id $SCAN_ID \
|
|
--output explanations.json
|
|
|
|
# Explain only reachable findings
|
|
stella reachability explain-all --scan-id $SCAN_ID \
|
|
--status REACHABLE_STATIC,REACHABLE_PROVEN \
|
|
--output reachable-explanations.json
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Troubleshooting
|
|
|
|
### 5.1 Call Graph Too Large
|
|
|
|
**Symptom**: Upload fails with "413 Payload Too Large".
|
|
|
|
**Diagnosis**:
|
|
|
|
```bash
|
|
# Check graph size
|
|
du -h callgraph.json
|
|
wc -l callgraph.json
|
|
|
|
# Count nodes/edges
|
|
jq '.nodes | length' callgraph.json
|
|
jq '.edges | length' callgraph.json
|
|
```
|
|
|
|
**Resolution**:
|
|
|
|
```bash
|
|
# Option 1: Use streaming upload
|
|
stella scan graph upload --scan-id $SCAN_ID \
|
|
--file callgraph.json \
|
|
--streaming
|
|
|
|
# Option 2: Convert to NDJSON
|
|
stella scan graph convert --input callgraph.json \
|
|
--output callgraph.ndjson \
|
|
--format ndjson
|
|
|
|
# Option 3: Partition by artifact
|
|
stella scan graph partition --input callgraph.json \
|
|
--output-dir ./partitions/ \
|
|
--by artifact
|
|
```
|
|
|
|
### 5.2 Missing Entrypoints
|
|
|
|
**Symptom**: "No entrypoints found" warning.
|
|
|
|
**Diagnosis**:
|
|
|
|
```bash
|
|
# Check entrypoint detection
|
|
stella scan graph entrypoints --scan-id $SCAN_ID --verbose
|
|
|
|
# Check for framework detection
|
|
stella scan graph detect-framework --scan-id $SCAN_ID
|
|
```
|
|
|
|
**Common causes**:
|
|
|
|
1. **Framework not detected**: Add framework hints
|
|
2. **Custom entrypoints**: Manually specify
|
|
3. **Wrong language worker**: Check artifact analysis
|
|
|
|
**Resolution**:
|
|
|
|
```bash
|
|
# Specify framework explicitly
|
|
stella scan graph upload --scan-id $SCAN_ID \
|
|
--file callgraph.json \
|
|
--framework aspnetcore
|
|
|
|
# Add custom entrypoints
|
|
stella scan graph entrypoint add --scan-id $SCAN_ID \
|
|
--node sha256:node123... \
|
|
--kind http \
|
|
--route "/api/custom"
|
|
```
|
|
|
|
### 5.3 Reachability Computation Timeout
|
|
|
|
**Symptom**: Job fails with "computation timeout".
|
|
|
|
**Diagnosis**:
|
|
|
|
```bash
|
|
# Check computation stats
|
|
stella reachability job-stats --job-id reachability-job-001
|
|
|
|
# Output:
|
|
# Nodes visited: 500,000
|
|
# Edges traversed: 2,500,000
|
|
# Time elapsed: 300s
|
|
# Memory used: 4.2 GB
|
|
```
|
|
|
|
**Resolution**:
|
|
|
|
```bash
|
|
# Option 1: Increase timeout
|
|
stella reachability compute --scan-id $SCAN_ID --timeout 600s
|
|
|
|
# Option 2: Reduce depth
|
|
stella reachability compute --scan-id $SCAN_ID --max-depth 5
|
|
|
|
# Option 3: Skip indirect calls
|
|
stella reachability compute --scan-id $SCAN_ID --indirect-resolution skip
|
|
|
|
# Option 4: Partition analysis
|
|
stella reachability compute --scan-id $SCAN_ID --partition-by artifact
|
|
```
|
|
|
|
### 5.4 Inconsistent Results
|
|
|
|
**Symptom**: Different results between runs.
|
|
|
|
**Diagnosis**:
|
|
|
|
```bash
|
|
# Check determinism settings
|
|
stella scan manifest --scan-id $SCAN_ID | jq '.deterministic, .seed'
|
|
|
|
# Compare graph hashes
|
|
stella scan graph hash --scan-id $SCAN_ID
|
|
```
|
|
|
|
**Resolution**:
|
|
|
|
```bash
|
|
# Ensure deterministic mode
|
|
stella reachability compute --scan-id $SCAN_ID \
|
|
--deterministic \
|
|
--seed "AQIDBA==" # Fixed seed
|
|
|
|
# Use same graph version
|
|
stella reachability compute --scan-id $SCAN_ID \
|
|
--graph-digest sha256:cg123...
|
|
```
|
|
|
|
### 5.5 False Positives/Negatives
|
|
|
|
**Symptom**: Reachability verdict seems incorrect.
|
|
|
|
**Diagnosis**:
|
|
|
|
```bash
|
|
# Get detailed explanation
|
|
stella reachability explain --scan-id $SCAN_ID \
|
|
--cve CVE-2024-1234 \
|
|
--purl "pkg:npm/lodash@4.17.20" \
|
|
--verbose
|
|
|
|
# Check edge confidence
|
|
stella scan graph edge --scan-id $SCAN_ID \
|
|
--from sha256:nodeA... \
|
|
--to sha256:nodeB...
|
|
```
|
|
|
|
**Common causes for false positives**:
|
|
|
|
1. **Heuristic edges**: Indirect call resolution too aggressive
|
|
2. **Reflection/dynamic calls**: May create false paths
|
|
3. **Dead code not detected**: Code exists but never executes
|
|
|
|
**Common causes for false negatives**:
|
|
|
|
1. **Missing edges**: Call graph incomplete
|
|
2. **Indirect calls skipped**: Resolution too conservative
|
|
3. **Cross-language calls**: Language boundary not bridged
|
|
|
|
**Resolution**:
|
|
|
|
```bash
|
|
# Adjust indirect call resolution
|
|
stella reachability compute --scan-id $SCAN_ID \
|
|
--indirect-resolution conservative
|
|
|
|
# Add runtime evidence
|
|
stella scan evidence upload --scan-id $SCAN_ID \
|
|
--file runtime-trace.json
|
|
|
|
# Report false positive/negative for ML training
|
|
stella reachability feedback --scan-id $SCAN_ID \
|
|
--cve CVE-2024-1234 \
|
|
--verdict false-positive \
|
|
--reason "Dead code - feature flag disabled"
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Monitoring & Alerting
|
|
|
|
### 6.1 Key Metrics
|
|
|
|
| Metric | Description | Alert Threshold |
|
|
|--------|-------------|-----------------|
|
|
| `callgraph_upload_duration_seconds` | Time to upload call graph | > 60s |
|
|
| `callgraph_size_bytes` | Size of uploaded graphs | > 200MB |
|
|
| `reachability_computation_duration_seconds` | Time to compute reachability | > 300s |
|
|
| `reachability_nodes_visited` | Nodes visited during BFS | > 1M |
|
|
| `reachability_job_failures_total` | Failed computation jobs | > 0/hour |
|
|
| `entrypoint_detection_rate` | % of scans with entrypoints | < 90% |
|
|
|
|
### 6.2 Grafana Dashboard
|
|
|
|
```
|
|
Dashboard: Reachability Operations
|
|
Panels:
|
|
- Call graph upload throughput
|
|
- Graph size distribution
|
|
- Computation duration (p50, p95, p99)
|
|
- Reachability verdict distribution
|
|
- Job queue depth
|
|
- Entrypoint detection rate
|
|
```
|
|
|
|
### 6.3 Alerting Rules
|
|
|
|
```yaml
|
|
groups:
|
|
- name: reachability
|
|
rules:
|
|
- alert: ReachabilityComputationSlow
|
|
expr: histogram_quantile(0.95, reachability_computation_duration_seconds) > 300
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Reachability computation is slow"
|
|
|
|
- alert: ReachabilityJobFailures
|
|
expr: increase(reachability_job_failures_total[1h]) > 5
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Multiple reachability job failures"
|
|
|
|
- alert: LowEntrypointDetectionRate
|
|
expr: entrypoint_detection_rate < 0.8
|
|
for: 1h
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Entrypoint detection rate is low"
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Escalation Procedures
|
|
|
|
### 7.1 Escalation Matrix
|
|
|
|
| Severity | Condition | Response Time | Escalation Path |
|
|
|----------|-----------|---------------|-----------------|
|
|
| P1 | Reachability failing for all scans | 15 min | On-call → Team Lead |
|
|
| P2 | Computation failures > 20% | 1 hour | On-call → Team Lead |
|
|
| P3 | Computation latency > 600s p95 | 4 hours | On-call |
|
|
| P4 | Entrypoint detection < 70% | 24 hours | Ticket |
|
|
|
|
### 7.2 P1 Response Procedure
|
|
|
|
1. **Acknowledge** alert
|
|
2. **Triage**:
|
|
```bash
|
|
# Check worker health
|
|
stella scanner workers status
|
|
|
|
# Check graph store connectivity
|
|
stella health check --service graph-store
|
|
|
|
# Check recent failures
|
|
stella reachability jobs --status failed --last 10
|
|
```
|
|
3. **Mitigate**:
|
|
```bash
|
|
# Scale up workers if queue backlog
|
|
kubectl scale deployment scanner-worker --replicas=10
|
|
|
|
# Clear stuck jobs
|
|
stella reachability jobs cancel --status stuck
|
|
```
|
|
4. **Communicate**: Update status page
|
|
5. **Resolve**: Fix root cause
|
|
6. **Postmortem**: Document within 48 hours
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [Reachability API Reference](../api/score-proofs-reachability-api-reference.md)
|
|
- [Scanner Architecture](../modules/scanner/architecture.md)
|
|
- [Call Graph Schema](../schemas/callgraph-v1.md)
|
|
- [Entrypoint Detection](../modules/scanner/operations/entrypoint-problem.md)
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-20
|
|
**Version**: 1.0.0
|
|
**Sprint**: 3500.0004.0004
|