Files
git.stella-ops.org/docs/operations/reachability-runbook.md
StellaOps Bot da315965ff feat: Add operations runbooks and UI API models for Sprint 3500.0004.x
Operations documentation:
- docs/operations/reachability-runbook.md - Reachability troubleshooting guide
- docs/operations/unknowns-queue-runbook.md - Unknowns queue management guide

UI TypeScript models:
- src/Web/StellaOps.Web/src/app/core/api/proof.models.ts - Proof ledger types
- src/Web/StellaOps.Web/src/app/core/api/reachability.models.ts - Reachability types
- src/Web/StellaOps.Web/src/app/core/api/unknowns.models.ts - Unknowns queue types

Sprint: SPRINT_3500_0004_0002 (UI), SPRINT_3500_0004_0004 (Docs)
2025-12-20 22:22:09 +02:00

15 KiB

Reachability Analysis Operations Runbook

Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20

This runbook covers operational procedures for Reachability Analysis, including call graph management, analysis troubleshooting, and explain queries.


Table of Contents

  1. Overview
  2. Call Graph Operations
  3. Reachability Computation
  4. Explain Queries
  5. Troubleshooting
  6. Monitoring & Alerting
  7. Escalation Procedures

1. Overview

What is Reachability Analysis?

Reachability Analysis determines whether vulnerable code is actually reachable from application entrypoints. This reduces false positives by filtering out vulnerabilities in code that cannot be executed.

Reachability Statuses

Status Confidence Description
UNREACHABLE High No path from entrypoints to vulnerable code
POSSIBLY_REACHABLE Medium Path exists but contains heuristic edges
REACHABLE_STATIC High Static analysis proves path exists
REACHABLE_PROVEN Very High Runtime evidence confirms execution
UNKNOWN Low Insufficient data to determine

Key Components

Component Purpose Location
Call Graph Extractor Language-specific CG extraction Scanner Worker plugins
Call Graph Store Persistent graph storage scanner.cg_node, scanner.cg_edge
Reachability Analyzer BFS pathfinding algorithm Scanner Core library
Entrypoint Detector Identifies application entrypoints Language-specific plugins

Prerequisites

  • Access to Scanner WebService API
  • scanner.reachability OAuth scope
  • CLI access with stella configured
  • Language-specific workers deployed (dotnet, java, etc.)

2. Call Graph Operations

2.1 Call Graph Upload

# Upload via API
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/callgraphs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Content-Digest: sha256=$(sha256sum callgraph.json | cut -d' ' -f1)" \
  -d @callgraph.json

# Upload via CLI
stella scan graph upload --scan-id $SCAN_ID --file callgraph.json

# Upload streaming NDJSON (for large graphs)
stella scan graph upload --scan-id $SCAN_ID \
  --file callgraph.ndjson \
  --format ndjson \
  --streaming

2.2 Call Graph Inspection

# Get call graph summary
stella scan graph summary --scan-id $SCAN_ID

# Output:
# Nodes: 12,345
# Edges: 56,789
# Entrypoints: 42
# Languages: [dotnet, java]
# Size: 15.2 MB

# List entrypoints
stella scan graph entrypoints --scan-id $SCAN_ID

# Export full graph (for debugging)
stella scan graph export --scan-id $SCAN_ID --output graph.json

# Visualize subgraph (requires GraphViz)
stella scan graph visualize --scan-id $SCAN_ID \
  --node sha256:node123... \
  --depth 3 \
  --output subgraph.svg

2.3 Call Graph Validation

# Validate graph structure
stella scan graph validate --scan-id $SCAN_ID

# Checks performed:
# - All edge targets exist as nodes
# - Entrypoints reference valid nodes
# - No orphan nodes
# - No cycles in entrypoint definitions
# - Schema compliance

# Validate before upload
stella scan graph validate --file callgraph.json --strict

2.4 Call Graph Merging

When multiple language workers produce graphs:

# View merge status
stella scan graph merges --scan-id $SCAN_ID

# Output:
# Language   | Nodes  | Edges  | Status
# dotnet     | 8,234  | 34,567 | merged
# java       | 4,111  | 22,222 | merged
# Total      | 12,345 | 56,789 | complete

# Force re-merge (after fix)
stella scan graph merge --scan-id $SCAN_ID --force

3. Reachability Computation

3.1 Triggering Computation

# Trigger via API
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/compute" \
  -H "Authorization: Bearer $TOKEN"

# Trigger via CLI
stella reachability compute --scan-id $SCAN_ID

# Trigger with options
stella reachability compute --scan-id $SCAN_ID \
  --max-depth 20 \
  --indirect-resolution conservative \
  --timeout 300s

3.2 Computation Options

Option Default Description
max-depth 10 Maximum path length to explore
indirect-resolution conservative How to handle indirect calls: conservative, aggressive, skip
timeout 300s Maximum computation time
parallel true Parallel BFS from multiple entrypoints
include-runtime true Merge runtime evidence if available

3.3 Job Monitoring

# Check job status
stella reachability job-status --job-id reachability-job-001

# Output:
# Status: running
# Progress: 67% (8,234 / 12,345 nodes visited)
# Started: 2025-12-20T10:00:00Z
# Estimated completion: 2025-12-20T10:02:30Z

# Stream job logs
stella reachability job-logs --job-id reachability-job-001 --follow

# Cancel running job
stella reachability job-cancel --job-id reachability-job-001

3.4 Computation Results

# Get summary
stella reachability summary --scan-id $SCAN_ID

# Output:
# Total vulnerabilities: 45
# Unreachable: 38 (84%)
# Possibly reachable: 4 (9%)
# Reachable (static): 2 (4%)
# Reachable (proven): 1 (2%)
# Unknown: 0 (0%)

# Get detailed findings
stella reachability findings --scan-id $SCAN_ID --format json

# Filter by status
stella reachability findings --scan-id $SCAN_ID --status REACHABLE_STATIC

# Export for CI gate
stella reachability findings --scan-id $SCAN_ID \
  --status REACHABLE_STATIC,REACHABLE_PROVEN \
  --format sarif \
  --output findings.sarif

4. Explain Queries

4.1 Explain Single Finding

# Via API
curl "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/reachability/explain?cve=CVE-2024-1234&purl=pkg:npm/lodash@4.17.20" \
  -H "Authorization: Bearer $TOKEN"

# Via CLI
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --purl "pkg:npm/lodash@4.17.20"

# Output:
# Status: REACHABLE_STATIC
# Confidence: 0.70
# 
# Shortest Path (depth=3):
# [0] MyApp.Controllers.OrdersController::Get(Guid)
#     Entrypoint: HTTP GET /api/orders/{id}
# [1] MyApp.Services.OrderService::Process(Order)
#     Edge: static (direct_call)
# [2] Lodash.merge(Object, Object) [VULNERABLE]
#     Edge: static (direct_call)
# 
# Why Reachable:
# - Static call path exists from HTTP entrypoint
# - All edges are statically proven
# - Vulnerable function is directly invoked

4.2 Explain with Alternatives

# Show all paths (not just shortest)
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --purl "pkg:npm/lodash@4.17.20" \
  --all-paths

# Output includes:
# Alternative paths found: 3
# Path 1 (depth=3): ... [shown above]
# Path 2 (depth=5): Controllers.UserController -> ... -> Lodash.merge
# Path 3 (depth=7): Background.JobProcessor -> ... -> Lodash.merge

4.3 Why Unreachable

# Explain why vulnerability is unreachable
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-5678 \
  --purl "pkg:npm/unused-lib@1.0.0"

# Output:
# Status: UNREACHABLE
# Confidence: 0.95
# 
# Why Unreachable:
# - No path found from any entrypoint
# - Vulnerable function: UnusedLib.dangerousMethod()
# - Function visibility: private
# - Callers found: 0
# - Dead code analysis: likely dead code

4.4 Batch Explain

# Export all reachability explanations
stella reachability explain-all --scan-id $SCAN_ID \
  --output explanations.json

# Explain only reachable findings
stella reachability explain-all --scan-id $SCAN_ID \
  --status REACHABLE_STATIC,REACHABLE_PROVEN \
  --output reachable-explanations.json

5. Troubleshooting

5.1 Call Graph Too Large

Symptom: Upload fails with "413 Payload Too Large".

Diagnosis:

# Check graph size
du -h callgraph.json
wc -l callgraph.json

# Count nodes/edges
jq '.nodes | length' callgraph.json
jq '.edges | length' callgraph.json

Resolution:

# Option 1: Use streaming upload
stella scan graph upload --scan-id $SCAN_ID \
  --file callgraph.json \
  --streaming

# Option 2: Convert to NDJSON
stella scan graph convert --input callgraph.json \
  --output callgraph.ndjson \
  --format ndjson

# Option 3: Partition by artifact
stella scan graph partition --input callgraph.json \
  --output-dir ./partitions/ \
  --by artifact

5.2 Missing Entrypoints

Symptom: "No entrypoints found" warning.

Diagnosis:

# Check entrypoint detection
stella scan graph entrypoints --scan-id $SCAN_ID --verbose

# Check for framework detection
stella scan graph detect-framework --scan-id $SCAN_ID

Common causes:

  1. Framework not detected: Add framework hints
  2. Custom entrypoints: Manually specify
  3. Wrong language worker: Check artifact analysis

Resolution:

# Specify framework explicitly
stella scan graph upload --scan-id $SCAN_ID \
  --file callgraph.json \
  --framework aspnetcore

# Add custom entrypoints
stella scan graph entrypoint add --scan-id $SCAN_ID \
  --node sha256:node123... \
  --kind http \
  --route "/api/custom"

5.3 Reachability Computation Timeout

Symptom: Job fails with "computation timeout".

Diagnosis:

# Check computation stats
stella reachability job-stats --job-id reachability-job-001

# Output:
# Nodes visited: 500,000
# Edges traversed: 2,500,000
# Time elapsed: 300s
# Memory used: 4.2 GB

Resolution:

# Option 1: Increase timeout
stella reachability compute --scan-id $SCAN_ID --timeout 600s

# Option 2: Reduce depth
stella reachability compute --scan-id $SCAN_ID --max-depth 5

# Option 3: Skip indirect calls
stella reachability compute --scan-id $SCAN_ID --indirect-resolution skip

# Option 4: Partition analysis
stella reachability compute --scan-id $SCAN_ID --partition-by artifact

5.4 Inconsistent Results

Symptom: Different results between runs.

Diagnosis:

# Check determinism settings
stella scan manifest --scan-id $SCAN_ID | jq '.deterministic, .seed'

# Compare graph hashes
stella scan graph hash --scan-id $SCAN_ID

Resolution:

# Ensure deterministic mode
stella reachability compute --scan-id $SCAN_ID \
  --deterministic \
  --seed "AQIDBA=="  # Fixed seed

# Use same graph version
stella reachability compute --scan-id $SCAN_ID \
  --graph-digest sha256:cg123...

5.5 False Positives/Negatives

Symptom: Reachability verdict seems incorrect.

Diagnosis:

# Get detailed explanation
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --purl "pkg:npm/lodash@4.17.20" \
  --verbose

# Check edge confidence
stella scan graph edge --scan-id $SCAN_ID \
  --from sha256:nodeA... \
  --to sha256:nodeB...

Common causes for false positives:

  1. Heuristic edges: Indirect call resolution too aggressive
  2. Reflection/dynamic calls: May create false paths
  3. Dead code not detected: Code exists but never executes

Common causes for false negatives:

  1. Missing edges: Call graph incomplete
  2. Indirect calls skipped: Resolution too conservative
  3. Cross-language calls: Language boundary not bridged

Resolution:

# Adjust indirect call resolution
stella reachability compute --scan-id $SCAN_ID \
  --indirect-resolution conservative

# Add runtime evidence
stella scan evidence upload --scan-id $SCAN_ID \
  --file runtime-trace.json

# Report false positive/negative for ML training
stella reachability feedback --scan-id $SCAN_ID \
  --cve CVE-2024-1234 \
  --verdict false-positive \
  --reason "Dead code - feature flag disabled"

6. Monitoring & Alerting

6.1 Key Metrics

Metric Description Alert Threshold
callgraph_upload_duration_seconds Time to upload call graph > 60s
callgraph_size_bytes Size of uploaded graphs > 200MB
reachability_computation_duration_seconds Time to compute reachability > 300s
reachability_nodes_visited Nodes visited during BFS > 1M
reachability_job_failures_total Failed computation jobs > 0/hour
entrypoint_detection_rate % of scans with entrypoints < 90%

6.2 Grafana Dashboard

Dashboard: Reachability Operations
Panels:
- Call graph upload throughput
- Graph size distribution
- Computation duration (p50, p95, p99)
- Reachability verdict distribution
- Job queue depth
- Entrypoint detection rate

6.3 Alerting Rules

groups:
  - name: reachability
    rules:
      - alert: ReachabilityComputationSlow
        expr: histogram_quantile(0.95, reachability_computation_duration_seconds) > 300
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Reachability computation is slow"
          
      - alert: ReachabilityJobFailures
        expr: increase(reachability_job_failures_total[1h]) > 5
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Multiple reachability job failures"
          
      - alert: LowEntrypointDetectionRate
        expr: entrypoint_detection_rate < 0.8
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Entrypoint detection rate is low"

7. Escalation Procedures

7.1 Escalation Matrix

Severity Condition Response Time Escalation Path
P1 Reachability failing for all scans 15 min On-call → Team Lead
P2 Computation failures > 20% 1 hour On-call → Team Lead
P3 Computation latency > 600s p95 4 hours On-call
P4 Entrypoint detection < 70% 24 hours Ticket

7.2 P1 Response Procedure

  1. Acknowledge alert
  2. Triage:
    # Check worker health
    stella scanner workers status
    
    # Check graph store connectivity
    stella health check --service graph-store
    
    # Check recent failures
    stella reachability jobs --status failed --last 10
    
  3. Mitigate:
    # Scale up workers if queue backlog
    kubectl scale deployment scanner-worker --replicas=10
    
    # Clear stuck jobs
    stella reachability jobs cancel --status stuck
    
  4. Communicate: Update status page
  5. Resolve: Fix root cause
  6. Postmortem: Document within 48 hours


Last Updated: 2025-12-20
Version: 1.0.0
Sprint: 3500.0004.0004