git.stella-ops.org/docs/operations/reachability-runbook.md

# Reachability Analysis Operations Runbook

> **Version**: 1.0.0
> **Sprint**: 3500.0004.0004
> **Last Updated**: 2025-12-20

This runbook covers operational procedures for Reachability Analysis, including call graph management, analysis troubleshooting, and explain queries.

---

## Table of Contents

1. [Overview](#1-overview)
2. [Call Graph Operations](#2-call-graph-operations)
3. [Reachability Computation](#3-reachability-computation)
4. [Explain Queries](#4-explain-queries)
5. [Troubleshooting](#5-troubleshooting)
6. [Monitoring & Alerting](#6-monitoring--alerting)
7. [Escalation Procedures](#7-escalation-procedures)

---

## 1. Overview

### What is Reachability Analysis?

Reachability Analysis determines whether vulnerable code is actually reachable from application entrypoints. This reduces false positives by filtering out vulnerabilities in code that cannot be executed.

### Reachability Statuses

| Status | Confidence | Description |
|--------|------------|-------------|
| `UNREACHABLE` | High | No path from entrypoints to vulnerable code |
| `POSSIBLY_REACHABLE` | Medium | Path exists but contains heuristic edges |
| `REACHABLE_STATIC` | High | Static analysis proves path exists |
| `REACHABLE_PROVEN` | Very High | Runtime evidence confirms execution |
| `UNKNOWN` | Low | Insufficient data to determine |

### Key Components

| Component | Purpose | Location |
|-----------|---------|----------|
| Call Graph Extractor | Language-specific CG extraction | Scanner Worker plugins |
| Call Graph Store | Persistent graph storage | `scanner.cg_node`, `scanner.cg_edge` |
| Reachability Analyzer | BFS pathfinding algorithm | Scanner Core library |
| Entrypoint Detector | Identifies application entrypoints | Language-specific plugins |

### Prerequisites

- Access to Scanner WebService API
- `scanner.reachability` OAuth scope
- CLI access with `stella` configured
- Language-specific workers deployed (dotnet, java, etc.)

---

## 2. Call Graph Operations

### 2.1 Call Graph Upload

```bash
# Upload via API
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/callgraphs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Content-Digest: sha256=$(sha256sum callgraph.json | cut -d' ' -f1)" \
  -d @callgraph.json

# Upload via CLI
stella scan graph upload --scan-id $SCAN_ID --file callgraph.json

# Upload streaming NDJSON (for large graphs)
stella scan graph upload --scan-id $SCAN_ID \
  --file callgraph.ndjson \
  --format ndjson \
  --streaming
```

### 2.2 Call Graph Inspection

```bash
# Get call graph summary
stella scan graph summary --scan-id $SCAN_ID

# Output:
# Nodes: 12,345
# Edges: 56,789
# Entrypoints: 42
# Languages: [dotnet, java]
# Size: 15.2 MB

# List entrypoints
stella scan graph entrypoints --scan-id $SCAN_ID

# Export full graph (for debugging)
stella scan graph export --scan-id $SCAN_ID --output graph.json

# Visualize subgraph (requires GraphViz)
stella scan graph visualize --scan-id $SCAN_ID \
  --node sha256:node123... \
  --depth 3 \
  --output subgraph.svg
```

### 2.3 Call Graph Validation

```bash
# Validate graph structure
stella scan graph validate --scan-id $SCAN_ID

# Checks performed:
# - All edge targets exist as nodes
# - Entrypoints reference valid nodes
# - No orphan nodes
# - No cycles in entrypoint definitions
# - Schema compliance

# Validate before upload
stella scan graph validate --file callgraph.json --strict
```

### 2.4 Call Graph Merging

When multiple language workers produce graphs:

```bash
# View merge status
stella scan graph merges --scan-id $SCAN_ID

# Output:
# Language   | Nodes  | Edges  | Status
# dotnet     | 8,234  | 34,567 | merged
# java       | 4,111  | 22,222 | merged
# Total      | 12,345 | 56,789 | complete

# Force re-merge (after fix)
stella scan graph merge --scan-id $SCAN_ID --force
```

---

## 3. Reachability Computation

### 3.1 Triggering Computation

```bash
# Trigger via API
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/compute" \
  -H "Authorization: Bearer $TOKEN"

# Trigger via CLI
stella reachability compute --scan-id $SCAN_ID

# Trigger with options
stella reachability compute --scan-id $SCAN_ID \
  --max-depth 20 \
  --indirect-resolution conservative \
  --timeout 300s
```

### 3.2 Computation Options

| Option | Default | Description |
|--------|---------|-------------|
| `max-depth` | 10 | Maximum path length to explore |
| `indirect-resolution` | `conservative` | How to handle indirect calls: `conservative`, `aggressive`, `skip` |
| `timeout` | 300s | Maximum computation time |
| `parallel` | true | Parallel BFS from multiple entrypoints |
| `include-runtime` | true | Merge runtime evidence if available |

### 3.3 Job Monitoring

```bash
# Check job status
stella reachability job-status --job-id reachability-job-001

# Output:
# Status: running
# Progress: 67% (8,234 / 12,345 nodes visited)
# Started: 2025-12-20T10:00:00Z
# Estimated completion: 2025-12-20T10:02:30Z

# Stream job logs
stella reachability job-logs --job-id reachability-job-001 --follow

# Cancel running job
stella reachability job-cancel --job-id reachability-job-001
```

### 3.4 Computation Results

```bash
# Get summary
stella reachability summary --scan-id $SCAN_ID

# Output:
# Total vulnerabilities: 45
# Unreachable: 38 (84%)
# Possibly reachable: 4 (9%)
# Reachable (static): 2 (4%)
# Reachable (proven): 1 (2%)
# Unknown: 0 (0%)

# Get detailed findings
stella reachability findings --scan-id $SCAN_ID --format json

# Filter by status
stella reachability findings --scan-id $SCAN_ID --status REACHABLE_STATIC

# Export for CI gate
stella reachability findings --scan-id $SCAN_ID \
  --status REACHABLE_STATIC,REACHABLE_PROVEN \
  --format sarif \
  --output findings.sarif
```

---

## 4. Explain Queries

### 4.1 Explain Single Finding

```bash
# Via API
curl "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/explain?cve=CVE-2024-1234&purl=pkg:npm/lodash@4.17.20" \
  -H "Authorization: Bearer $TOKEN"

# Via CLI
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --purl "pkg:npm/lodash@4.17.20"

# Output:
# Status: REACHABLE_STATIC
# Confidence: 0.70
#
# Shortest Path (depth=3):
# [0] MyApp.Controllers.OrdersController::Get(Guid)
#     Entrypoint: HTTP GET /api/orders/{id}
# [1] MyApp.Services.OrderService::Process(Order)
#     Edge: static (direct_call)
# [2] Lodash.merge(Object, Object) [VULNERABLE]
#     Edge: static (direct_call)
#
# Why Reachable:
# - Static call path exists from HTTP entrypoint
# - All edges are statically proven
# - Vulnerable function is directly invoked
```

### 4.2 Explain with Alternatives

```bash
# Show all paths (not just shortest)
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --purl "pkg:npm/lodash@4.17.20" \
  --all-paths

# Output includes:
# Alternative paths found: 3
# Path 1 (depth=3): ... [shown above]
# Path 2 (depth=5): Controllers.UserController -> ... -> Lodash.merge
# Path 3 (depth=7): Background.JobProcessor -> ... -> Lodash.merge
```

### 4.3 Why Unreachable

```bash
# Explain why vulnerability is unreachable
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-5678 \
  --purl "pkg:npm/unused-lib@1.0.0"

# Output:
# Status: UNREACHABLE
# Confidence: 0.95
#
# Why Unreachable:
# - No path found from any entrypoint
# - Vulnerable function: UnusedLib.dangerousMethod()
# - Function visibility: private
# - Callers found: 0
# - Dead code analysis: likely dead code
```

### 4.4 Batch Explain

```bash
# Export all reachability explanations
stella reachability explain-all --scan-id $SCAN_ID \
  --output explanations.json

# Explain only reachable findings
stella reachability explain-all --scan-id $SCAN_ID \
  --status REACHABLE_STATIC,REACHABLE_PROVEN \
  --output reachable-explanations.json
```

---

## 5. Troubleshooting

### 5.1 Call Graph Too Large

**Symptom**: Upload fails with "413 Payload Too Large".

**Diagnosis**:

```bash
# Check graph size
du -h callgraph.json
wc -l callgraph.json

# Count nodes/edges
jq '.nodes | length' callgraph.json
jq '.edges | length' callgraph.json
```

**Resolution**:

```bash
# Option 1: Use streaming upload
stella scan graph upload --scan-id $SCAN_ID \
  --file callgraph.json \
  --streaming

# Option 2: Convert to NDJSON
stella scan graph convert --input callgraph.json \
  --output callgraph.ndjson \
  --format ndjson

# Option 3: Partition by artifact
stella scan graph partition --input callgraph.json \
  --output-dir ./partitions/ \
  --by artifact
```

### 5.2 Missing Entrypoints

**Symptom**: "No entrypoints found" warning.

**Diagnosis**:

```bash
# Check entrypoint detection
stella scan graph entrypoints --scan-id $SCAN_ID --verbose

# Check for framework detection
stella scan graph detect-framework --scan-id $SCAN_ID
```

**Common causes**:

1. **Framework not detected**: Add framework hints
2. **Custom entrypoints**: Manually specify
3. **Wrong language worker**: Check artifact analysis

**Resolution**:

```bash
# Specify framework explicitly
stella scan graph upload --scan-id $SCAN_ID \
  --file callgraph.json \
  --framework aspnetcore

# Add custom entrypoints
stella scan graph entrypoint add --scan-id $SCAN_ID \
  --node sha256:node123... \
  --kind http \
  --route "/api/custom"
```

### 5.3 Reachability Computation Timeout

**Symptom**: Job fails with "computation timeout".

**Diagnosis**:

```bash
# Check computation stats
stella reachability job-stats --job-id reachability-job-001

# Output:
# Nodes visited: 500,000
# Edges traversed: 2,500,000
# Time elapsed: 300s
# Memory used: 4.2 GB
```

**Resolution**:

```bash
# Option 1: Increase timeout
stella reachability compute --scan-id $SCAN_ID --timeout 600s

# Option 2: Reduce depth
stella reachability compute --scan-id $SCAN_ID --max-depth 5

# Option 3: Skip indirect calls
stella reachability compute --scan-id $SCAN_ID --indirect-resolution skip

# Option 4: Partition analysis
stella reachability compute --scan-id $SCAN_ID --partition-by artifact
```

### 5.4 Inconsistent Results

**Symptom**: Different results between runs.

**Diagnosis**:

```bash
# Check determinism settings
stella scan manifest --scan-id $SCAN_ID | jq '.deterministic, .seed'

# Compare graph hashes
stella scan graph hash --scan-id $SCAN_ID
```

**Resolution**:

```bash
# Ensure deterministic mode
stella reachability compute --scan-id $SCAN_ID \
  --deterministic \
  --seed "AQIDBA=="  # Fixed seed

# Use same graph version
stella reachability compute --scan-id $SCAN_ID \
  --graph-digest sha256:cg123...
```

### 5.5 False Positives/Negatives

**Symptom**: Reachability verdict seems incorrect.

**Diagnosis**:

```bash
# Get detailed explanation
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --purl "pkg:npm/lodash@4.17.20" \
  --verbose

# Check edge confidence
stella scan graph edge --scan-id $SCAN_ID \
  --from sha256:nodeA... \
  --to sha256:nodeB...
```

**Common causes for false positives**:

1. **Heuristic edges**: Indirect call resolution too aggressive
2. **Reflection/dynamic calls**: May create false paths
3. **Dead code not detected**: Code exists but never executes

**Common causes for false negatives**:

1. **Missing edges**: Call graph incomplete
2. **Indirect calls skipped**: Resolution too conservative
3. **Cross-language calls**: Language boundary not bridged

**Resolution**:

```bash
# Adjust indirect call resolution
stella reachability compute --scan-id $SCAN_ID \
  --indirect-resolution conservative

# Add runtime evidence
stella scan evidence upload --scan-id $SCAN_ID \
  --file runtime-trace.json

# Report false positive/negative for ML training
stella reachability feedback --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --verdict false-positive \
  --reason "Dead code - feature flag disabled"
```

---

## 6. Monitoring & Alerting

### 6.1 Key Metrics

| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `callgraph_upload_duration_seconds` | Time to upload call graph | > 60s |
| `callgraph_size_bytes` | Size of uploaded graphs | > 200MB |
| `reachability_computation_duration_seconds` | Time to compute reachability | > 300s |
| `reachability_nodes_visited` | Nodes visited during BFS | > 1M |
| `reachability_job_failures_total` | Failed computation jobs | > 0/hour |
| `entrypoint_detection_rate` | % of scans with entrypoints | < 90% |

### 6.2 Grafana Dashboard

```
Dashboard: Reachability Operations
Panels:
- Call graph upload throughput
- Graph size distribution
- Computation duration (p50, p95, p99)
- Reachability verdict distribution
- Job queue depth
- Entrypoint detection rate
```

### 6.3 Alerting Rules

```yaml
groups:
  - name: reachability
    rules:
      - alert: ReachabilityComputationSlow
        expr: histogram_quantile(0.95, reachability_computation_duration_seconds) > 300
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Reachability computation is slow"

      - alert: ReachabilityJobFailures
        expr: increase(reachability_job_failures_total[1h]) > 5
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Multiple reachability job failures"

      - alert: LowEntrypointDetectionRate
        expr: entrypoint_detection_rate < 0.8
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Entrypoint detection rate is low"
```

---

## 7. Escalation Procedures

### 7.1 Escalation Matrix

| Severity | Condition | Response Time | Escalation Path |
|----------|-----------|---------------|-----------------|
| P1 | Reachability failing for all scans | 15 min | On-call → Team Lead |
| P2 | Computation failures > 20% | 1 hour | On-call → Team Lead |
| P3 | Computation latency > 600s p95 | 4 hours | On-call |
| P4 | Entrypoint detection < 70% | 24 hours | Ticket |

### 7.2 P1 Response Procedure

1. **Acknowledge** alert
2. **Triage**:
   ```bash
   # Check worker health
   stella scanner workers status

   # Check graph store connectivity
   stella health check --service graph-store

   # Check recent failures
   stella reachability jobs --status failed --last 10
   ```
3. **Mitigate**:
   ```bash
   # Scale up workers if queue backlog
   kubectl scale deployment scanner-worker --replicas=10

   # Clear stuck jobs
   stella reachability jobs cancel --status stuck
   ```
4. **Communicate**: Update status page
5. **Resolve**: Fix root cause
6. **Postmortem**: Document within 48 hours

---

## Related Documentation

- [Reachability API Reference](../api/score-proofs-reachability-api-reference.md)
- [Scanner Architecture](../modules/scanner/architecture.md)
- [Call Graph Schema](../schemas/callgraph-v1.md)
- [Entrypoint Detection](../modules/scanner/operations/entrypoint-problem.md)

---

**Last Updated**: 2025-12-20
**Version**: 1.0.0
**Sprint**: 3500.0004.0004