git.stella-ops.org/docs/training/troubleshooting-guide.md

# Score Proofs & Reachability Troubleshooting Guide

**Sprint:** SPRINT_3500_0004_0004
**Audience:** Operations, Support, Security Engineers

---

## Quick Diagnostic Commands

```bash
# Check system health
stella status

# Verify scan completed successfully
stella scan status --scan-id $SCAN_ID

# Check reachability computation status
stella reachability job-status --job-id $JOB_ID

# Verify proof integrity
stella proof verify --scan-id $SCAN_ID --verbose
```

---

## Score Proofs Issues

### 1. Replay Produces Different Results

**Symptoms:**
- `stella score replay` output differs from original
- Verification fails with "hash mismatch"

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Missing inputs | `stella proof inspect --check-inputs` shows gaps | Export with `--include-inputs` |
| Algorithm version mismatch | Check `environment.scannerVersion` in manifest | Use matching scanner version |
| Non-deterministic config | Review `configuration` section | Enable `--deterministic` mode |
| Feed drift | Compare `advisoryFeeds.asOf` timestamps | Use frozen feeds |

**Resolution Steps:**

```bash
# Step 1: Inspect the proof
stella proof inspect --scan-id $SCAN_ID

# Step 2: Check for missing inputs
stella proof inspect --scan-id $SCAN_ID --check-inputs

# Step 3: If inputs missing, re-export with data
stella proof export --scan-id $SCAN_ID --include-inputs --output proof-full.zip

# Step 4: Retry replay
stella score replay --scan-id $SCAN_ID --bundle proof-full.zip
```

---

### 2. Signature Verification Failed

**Symptoms:**
- "Invalid signature" or "Signature verification failed"
- `stella proof verify` returns error

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Key rotation | Check `stella trust list` for key dates | Import new trust anchor |
| Corrupted bundle | Verify file integrity | Re-download bundle |
| Wrong trust root | Check issuer in attestation | Configure correct trust |
| Tampered content | Hash mismatch in bundle | Investigate tampering |

**Resolution Steps:**

```bash
# Step 1: Verbose verification
stella proof verify --scan-id $SCAN_ID --verbose

# Step 2: Check trust anchors
stella trust list

# Step 3: If key rotated, import new anchor
stella trust import --file new-public-key.pem

# Step 4: Retry verification
stella proof verify --scan-id $SCAN_ID
```

---

### 3. Proof Chain Broken

**Symptoms:**
- "Chain integrity violation"
- "prev_hash mismatch"

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Database corruption | Check Postgres logs | Restore from backup |
| Manual modification | Audit access logs | Investigate, restore |
| Storage failure | Check disk health | Repair/restore |

**Resolution Steps:**

```bash
# Step 1: Check chain status
stella proof status --scan-id $SCAN_ID

# Step 2: Find break point
stella proof list --since "30 days" --verify-chain

# Step 3: If database issue
# Check Postgres logs
# Restore from backup if needed
```

---

### 4. Proof Export Fails

**Symptoms:**
- "Failed to export proof bundle"
- Timeout during export

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Large inputs | Check SBOM/graph size | Use `--exclude-inputs` |
| Storage full | Check disk space | Clear space or use different path |
| Network timeout | Check network connectivity | Increase timeout |

**Resolution Steps:**

```bash
# Step 1: Export without inputs (smaller)
stella proof export --scan-id $SCAN_ID --output proof.zip

# Step 2: If still fails, check disk
# Windows: Get-Volume | Format-Table
# Linux: df -h

# Step 3: Try alternative location
stella proof export --scan-id $SCAN_ID --output /tmp/proof.zip
```

---

## Reachability Issues

### 1. Too Many UNKNOWN Findings

**Symptoms:**
- Most vulnerabilities show `UNKNOWN` reachability status
- Coverage percentage is low

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| No call graph | `stella scan graph summary` returns empty | Upload call graph |
| Incomplete graph | Low node count | Regenerate with more options |
| Symbol mismatch | Symbols not resolved | Check symbol resolution |

**Resolution Steps:**

```bash
# Step 1: Check if call graph exists
stella scan graph summary --scan-id $SCAN_ID

# Step 2: If missing, generate and upload
# .NET example:
dotnet build --generate-call-graph
stella scan graph upload --scan-id $SCAN_ID --file callgraph.json

# Step 3: Verify entrypoints detected
stella scan graph entrypoints --scan-id $SCAN_ID

# Step 4: Recompute reachability
stella reachability compute --scan-id $SCAN_ID --force
```

---

### 2. False UNREACHABLE Findings

**Symptoms:**
- Known-reachable code marked UNREACHABLE
- Security team reports false negatives

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Missing edges | Graph incomplete | Add missing calls |
| Reflection not detected | Edge type missing | Add reflection hints |
| Entrypoint not detected | Check entrypoints list | Add manual entrypoint |

**Resolution Steps:**

```bash
# Step 1: Explain the specific finding
stella reachability explain --scan-id $SCAN_ID \
  --cve CVE-2024-XXXX \
  --purl "pkg:type/name@version" \
  --verbose

# Step 2: Check if entrypoint is known
stella scan graph entrypoints --scan-id $SCAN_ID | grep -i "suspected-entry"

# Step 3: Add missing entrypoint if needed
stella scan graph upload --scan-id $SCAN_ID \
  --file additional-entrypoints.json \
  --merge

# Step 4: Recompute
stella reachability compute --scan-id $SCAN_ID --force
```

---

### 3. Computation Timeout

**Symptoms:**
- "Computation exceeded timeout"
- Job stuck at percentage

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Large graph | Check node/edge count | Increase timeout |
| Deep paths | Max depth too high | Reduce max depth |
| Cycles | Graph has loops | Enable cycle detection |

**Resolution Steps:**

```bash
# Step 1: Check graph size
stella scan graph summary --scan-id $SCAN_ID

# Step 2: Increase timeout
stella reachability compute --scan-id $SCAN_ID --timeout 900s

# Step 3: Or reduce depth
stella reachability compute --scan-id $SCAN_ID --max-depth 10

# Step 4: Or partition analysis
stella reachability compute --scan-id $SCAN_ID --partition-by artifact
```

---

### 4. Inconsistent Results Between Runs

**Symptoms:**
- Same scan produces different reachability results
- Status changes between POSSIBLY_REACHABLE and UNKNOWN

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Non-deterministic mode | Check config | Enable deterministic mode |
| Concurrent modifications | Check job logs | Serialize jobs |
| Caching issues | Clear cache | Disable or clear cache |

**Resolution Steps:**

```bash
# Step 1: Enable deterministic mode
stella reachability compute --scan-id $SCAN_ID --deterministic --seed "fixed-seed"

# Step 2: Clear cache if needed
stella cache clear --scope reachability

# Step 3: Re-run computation
stella reachability compute --scan-id $SCAN_ID --force
```

---

## Unknowns Issues

### 1. Unknowns Not Appearing

**Symptoms:**
- Expected unknowns not in registry
- Count seems too low

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Auto-suppress enabled | Check workspace settings | Disable auto-suppress |
| Filter active | Check list filters | Clear filters |
| Different workspace | Verify workspace ID | Use correct workspace |

**Resolution Steps:**

```bash
# Step 1: List without filters
stella unknowns list --workspace-id $WS_ID --status all

# Step 2: Check workspace settings
stella config get unknowns.auto-suppress

# Step 3: Disable auto-suppress if needed
stella config set unknowns.auto-suppress false
```

---

### 2. Resolution Not Persisting

**Symptoms:**
- Resolved unknowns reappear
- Status resets to pending

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Scope too narrow | Check resolution scope | Use broader scope |
| New occurrence | Different scan/artifact | Resolve at workspace level |
| Database issue | Check error logs | Contact support |

**Resolution Steps:**

```bash
# Step 1: Check current scope
stella unknowns show --id $UNKNOWN_ID

# Step 2: Re-resolve with broader scope
stella unknowns resolve --id $UNKNOWN_ID \
  --resolution mapped \
  --scope workspace \
  --comment "Resolving at workspace level"
```

---

### 3. Priority Score Incorrect

**Symptoms:**
- Low priority for critical component
- Scoring doesn't reflect risk

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Missing context | Automatic scoring limited | Manually escalate |
| Outdated metadata | Component info stale | Refresh metadata |

**Resolution Steps:**

```bash
# Step 1: Escalate with correct severity
stella unknowns escalate --id $UNKNOWN_ID \
  --reason "Handles authentication - critical despite low auto-score" \
  --severity critical

# Step 2: Request scoring review
# Add comment explaining the discrepancy
```

---

## Air-Gap / Offline Issues

### 1. Offline Kit Import Fails

**Symptoms:**
- "Invalid offline kit"
- "Trust anchor missing"

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Corrupted transfer | Verify checksums | Re-transfer |
| Missing components | Check kit contents | Re-generate kit |
| Version mismatch | Check scanner version | Use matching versions |

**Resolution Steps:**

```bash
# Step 1: Verify kit integrity
sha256sum offline-kit.tar.gz
# Compare with manifest.sha256

# Step 2: Check kit contents
tar -tzf offline-kit.tar.gz | head -20

# Step 3: If incomplete, regenerate on connected system
stella airgap prepare --feeds nvd,ghsa --output offline-kit/
```

---

### 2. Time Anchor Issues

**Symptoms:**
- "Time anchor expired"
- "Cannot verify timestamp"

**Possible Causes:**

| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Old kit | Check time anchor date | Refresh kit |
| Clock drift | Check system clock | Sync system time |
| Expired anchor | Anchor has TTL | Generate new anchor |

**Resolution Steps:**

```bash
# Step 1: Check time anchor
cat offline-kit/time-anchor/timestamp.json

# Step 2: If expired, generate new (on connected system)
stella airgap prepare-time-anchor --output offline-kit/time-anchor/

# Step 3: Transfer and use new anchor
```

---

## Error Code Reference

| Error Code | Category | Meaning | Typical Resolution |
|------------|----------|---------|-------------------|
| E1001 | Proof | Manifest hash mismatch | Re-export with inputs |
| E1002 | Proof | Signature invalid | Check trust anchors |
| E1003 | Proof | Chain broken | Restore from backup |
| E2001 | Reach | No call graph | Upload call graph |
| E2002 | Reach | Computation timeout | Increase timeout |
| E2003 | Reach | Symbol not resolved | Check symbol DB |
| E3001 | Unknown | Resolution conflict | Use broader scope |
| E3002 | Unknown | Invalid category | Check category value |
| E4001 | Airgap | Invalid kit | Re-generate kit |
| E4002 | Airgap | Time anchor expired | Refresh anchor |

---

## Getting Help

### Collecting Diagnostics

```bash
# Generate diagnostic bundle
stella diagnostic collect --output diagnostics.zip

# Include specific scan
stella diagnostic collect --scan-id $SCAN_ID --output diagnostics.zip
```

### Log Locations

| Component | Log Path |
|-----------|----------|
| Scanner | `/var/log/stella/scanner.log` |
| Reachability | `/var/log/stella/reachability.log` |
| Proofs | `/var/log/stella/proofs.log` |
| CLI | `~/.stella/logs/cli.log` |

### Support Channels

- Documentation: `docs/` directory
- Issues: Internal issue tracker
- Emergency: On-call security team

---

## Related Documentation

- [Score Proofs Runbook](../operations/score-proofs-runbook.md)
- [Reachability Runbook](../operations/reachability-runbook.md)
- [Unknowns Queue Runbook](../operations/unknowns-queue-runbook.md)
- [Air-Gap Runbook](../airgap/score-proofs-reachability-airgap-runbook.md)

---

**Last Updated**: 2025-12-20
**Version**: 1.0.0
**Sprint**: 3500.0004.0004