Files
git.stella-ops.org/docs/training/troubleshooting-guide.md
StellaOps Bot 80b8254763 docs(sprint-3500.0004.0004): Complete documentation handoff
Sprint 3500.0004.0004 (Documentation & Handoff) - COMPLETE

Training Materials (T5 DONE):
- epic-3500-faq.md: Comprehensive FAQ for Score Proofs/Reachability
- video-tutorial-scripts.md: 6 video tutorial scripts
- Training guides already existed from prior work

Release Notes (T6 DONE):
- v2.5.0-release-notes.md: Full release notes with breaking changes,
  upgrade instructions, and performance benchmarks

OpenAPI Specs (T7 DONE):
- Scanner OpenAPI already comprehensive with ProofSpines, Unknowns,
  CallGraphs, Reachability endpoints and schemas

Handoff Checklist (T8 DONE):
- epic-3500-handoff-checklist.md: Complete handoff documentation
  including sign-off tracking, escalation paths, monitoring config

All 8/8 tasks complete. Sprint DONE.
Epic 3500 documentation deliverables complete.
2025-12-20 22:38:19 +02:00

493 lines
12 KiB
Markdown

# Score Proofs & Reachability Troubleshooting Guide
**Sprint:** SPRINT_3500_0004_0004
**Audience:** Operations, Support, Security Engineers
---
## Quick Diagnostic Commands
```bash
# Check system health
stella status
# Verify scan completed successfully
stella scan status --scan-id $SCAN_ID
# Check reachability computation status
stella reachability job-status --job-id $JOB_ID
# Verify proof integrity
stella proof verify --scan-id $SCAN_ID --verbose
```
---
## Score Proofs Issues
### 1. Replay Produces Different Results
**Symptoms:**
- `stella score replay` output differs from original
- Verification fails with "hash mismatch"
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Missing inputs | `stella proof inspect --check-inputs` shows gaps | Export with `--include-inputs` |
| Algorithm version mismatch | Check `environment.scannerVersion` in manifest | Use matching scanner version |
| Non-deterministic config | Review `configuration` section | Enable `--deterministic` mode |
| Feed drift | Compare `advisoryFeeds.asOf` timestamps | Use frozen feeds |
**Resolution Steps:**
```bash
# Step 1: Inspect the proof
stella proof inspect --scan-id $SCAN_ID
# Step 2: Check for missing inputs
stella proof inspect --scan-id $SCAN_ID --check-inputs
# Step 3: If inputs missing, re-export with data
stella proof export --scan-id $SCAN_ID --include-inputs --output proof-full.zip
# Step 4: Retry replay
stella score replay --scan-id $SCAN_ID --bundle proof-full.zip
```
---
### 2. Signature Verification Failed
**Symptoms:**
- "Invalid signature" or "Signature verification failed"
- `stella proof verify` returns error
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Key rotation | Check `stella trust list` for key dates | Import new trust anchor |
| Corrupted bundle | Verify file integrity | Re-download bundle |
| Wrong trust root | Check issuer in attestation | Configure correct trust |
| Tampered content | Hash mismatch in bundle | Investigate tampering |
**Resolution Steps:**
```bash
# Step 1: Verbose verification
stella proof verify --scan-id $SCAN_ID --verbose
# Step 2: Check trust anchors
stella trust list
# Step 3: If key rotated, import new anchor
stella trust import --file new-public-key.pem
# Step 4: Retry verification
stella proof verify --scan-id $SCAN_ID
```
---
### 3. Proof Chain Broken
**Symptoms:**
- "Chain integrity violation"
- "prev_hash mismatch"
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Database corruption | Check Postgres logs | Restore from backup |
| Manual modification | Audit access logs | Investigate, restore |
| Storage failure | Check disk health | Repair/restore |
**Resolution Steps:**
```bash
# Step 1: Check chain status
stella proof status --scan-id $SCAN_ID
# Step 2: Find break point
stella proof list --since "30 days" --verify-chain
# Step 3: If database issue
# Check Postgres logs
# Restore from backup if needed
```
---
### 4. Proof Export Fails
**Symptoms:**
- "Failed to export proof bundle"
- Timeout during export
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Large inputs | Check SBOM/graph size | Use `--exclude-inputs` |
| Storage full | Check disk space | Clear space or use different path |
| Network timeout | Check network connectivity | Increase timeout |
**Resolution Steps:**
```bash
# Step 1: Export without inputs (smaller)
stella proof export --scan-id $SCAN_ID --output proof.zip
# Step 2: If still fails, check disk
# Windows: Get-Volume | Format-Table
# Linux: df -h
# Step 3: Try alternative location
stella proof export --scan-id $SCAN_ID --output /tmp/proof.zip
```
---
## Reachability Issues
### 1. Too Many UNKNOWN Findings
**Symptoms:**
- Most vulnerabilities show `UNKNOWN` reachability status
- Coverage percentage is low
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| No call graph | `stella scan graph summary` returns empty | Upload call graph |
| Incomplete graph | Low node count | Regenerate with more options |
| Symbol mismatch | Symbols not resolved | Check symbol resolution |
**Resolution Steps:**
```bash
# Step 1: Check if call graph exists
stella scan graph summary --scan-id $SCAN_ID
# Step 2: If missing, generate and upload
# .NET example:
dotnet build --generate-call-graph
stella scan graph upload --scan-id $SCAN_ID --file callgraph.json
# Step 3: Verify entrypoints detected
stella scan graph entrypoints --scan-id $SCAN_ID
# Step 4: Recompute reachability
stella reachability compute --scan-id $SCAN_ID --force
```
---
### 2. False UNREACHABLE Findings
**Symptoms:**
- Known-reachable code marked UNREACHABLE
- Security team reports false negatives
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Missing edges | Graph incomplete | Add missing calls |
| Reflection not detected | Edge type missing | Add reflection hints |
| Entrypoint not detected | Check entrypoints list | Add manual entrypoint |
**Resolution Steps:**
```bash
# Step 1: Explain the specific finding
stella reachability explain --scan-id $SCAN_ID \
--cve CVE-2024-XXXX \
--purl "pkg:type/name@version" \
--verbose
# Step 2: Check if entrypoint is known
stella scan graph entrypoints --scan-id $SCAN_ID | grep -i "suspected-entry"
# Step 3: Add missing entrypoint if needed
stella scan graph upload --scan-id $SCAN_ID \
--file additional-entrypoints.json \
--merge
# Step 4: Recompute
stella reachability compute --scan-id $SCAN_ID --force
```
---
### 3. Computation Timeout
**Symptoms:**
- "Computation exceeded timeout"
- Job stuck at percentage
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Large graph | Check node/edge count | Increase timeout |
| Deep paths | Max depth too high | Reduce max depth |
| Cycles | Graph has loops | Enable cycle detection |
**Resolution Steps:**
```bash
# Step 1: Check graph size
stella scan graph summary --scan-id $SCAN_ID
# Step 2: Increase timeout
stella reachability compute --scan-id $SCAN_ID --timeout 900s
# Step 3: Or reduce depth
stella reachability compute --scan-id $SCAN_ID --max-depth 10
# Step 4: Or partition analysis
stella reachability compute --scan-id $SCAN_ID --partition-by artifact
```
---
### 4. Inconsistent Results Between Runs
**Symptoms:**
- Same scan produces different reachability results
- Status changes between POSSIBLY_REACHABLE and UNKNOWN
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Non-deterministic mode | Check config | Enable deterministic mode |
| Concurrent modifications | Check job logs | Serialize jobs |
| Caching issues | Clear cache | Disable or clear cache |
**Resolution Steps:**
```bash
# Step 1: Enable deterministic mode
stella reachability compute --scan-id $SCAN_ID --deterministic --seed "fixed-seed"
# Step 2: Clear cache if needed
stella cache clear --scope reachability
# Step 3: Re-run computation
stella reachability compute --scan-id $SCAN_ID --force
```
---
## Unknowns Issues
### 1. Unknowns Not Appearing
**Symptoms:**
- Expected unknowns not in registry
- Count seems too low
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Auto-suppress enabled | Check workspace settings | Disable auto-suppress |
| Filter active | Check list filters | Clear filters |
| Different workspace | Verify workspace ID | Use correct workspace |
**Resolution Steps:**
```bash
# Step 1: List without filters
stella unknowns list --workspace-id $WS_ID --status all
# Step 2: Check workspace settings
stella config get unknowns.auto-suppress
# Step 3: Disable auto-suppress if needed
stella config set unknowns.auto-suppress false
```
---
### 2. Resolution Not Persisting
**Symptoms:**
- Resolved unknowns reappear
- Status resets to pending
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Scope too narrow | Check resolution scope | Use broader scope |
| New occurrence | Different scan/artifact | Resolve at workspace level |
| Database issue | Check error logs | Contact support |
**Resolution Steps:**
```bash
# Step 1: Check current scope
stella unknowns show --id $UNKNOWN_ID
# Step 2: Re-resolve with broader scope
stella unknowns resolve --id $UNKNOWN_ID \
--resolution mapped \
--scope workspace \
--comment "Resolving at workspace level"
```
---
### 3. Priority Score Incorrect
**Symptoms:**
- Low priority for critical component
- Scoring doesn't reflect risk
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Missing context | Automatic scoring limited | Manually escalate |
| Outdated metadata | Component info stale | Refresh metadata |
**Resolution Steps:**
```bash
# Step 1: Escalate with correct severity
stella unknowns escalate --id $UNKNOWN_ID \
--reason "Handles authentication - critical despite low auto-score" \
--severity critical
# Step 2: Request scoring review
# Add comment explaining the discrepancy
```
---
## Air-Gap / Offline Issues
### 1. Offline Kit Import Fails
**Symptoms:**
- "Invalid offline kit"
- "Trust anchor missing"
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Corrupted transfer | Verify checksums | Re-transfer |
| Missing components | Check kit contents | Re-generate kit |
| Version mismatch | Check scanner version | Use matching versions |
**Resolution Steps:**
```bash
# Step 1: Verify kit integrity
sha256sum offline-kit.tar.gz
# Compare with manifest.sha256
# Step 2: Check kit contents
tar -tzf offline-kit.tar.gz | head -20
# Step 3: If incomplete, regenerate on connected system
stella airgap prepare --feeds nvd,ghsa --output offline-kit/
```
---
### 2. Time Anchor Issues
**Symptoms:**
- "Time anchor expired"
- "Cannot verify timestamp"
**Possible Causes:**
| Cause | Diagnosis | Solution |
|-------|-----------|----------|
| Old kit | Check time anchor date | Refresh kit |
| Clock drift | Check system clock | Sync system time |
| Expired anchor | Anchor has TTL | Generate new anchor |
**Resolution Steps:**
```bash
# Step 1: Check time anchor
cat offline-kit/time-anchor/timestamp.json
# Step 2: If expired, generate new (on connected system)
stella airgap prepare-time-anchor --output offline-kit/time-anchor/
# Step 3: Transfer and use new anchor
```
---
## Error Code Reference
| Error Code | Category | Meaning | Typical Resolution |
|------------|----------|---------|-------------------|
| E1001 | Proof | Manifest hash mismatch | Re-export with inputs |
| E1002 | Proof | Signature invalid | Check trust anchors |
| E1003 | Proof | Chain broken | Restore from backup |
| E2001 | Reach | No call graph | Upload call graph |
| E2002 | Reach | Computation timeout | Increase timeout |
| E2003 | Reach | Symbol not resolved | Check symbol DB |
| E3001 | Unknown | Resolution conflict | Use broader scope |
| E3002 | Unknown | Invalid category | Check category value |
| E4001 | Airgap | Invalid kit | Re-generate kit |
| E4002 | Airgap | Time anchor expired | Refresh anchor |
---
## Getting Help
### Collecting Diagnostics
```bash
# Generate diagnostic bundle
stella diagnostic collect --output diagnostics.zip
# Include specific scan
stella diagnostic collect --scan-id $SCAN_ID --output diagnostics.zip
```
### Log Locations
| Component | Log Path |
|-----------|----------|
| Scanner | `/var/log/stella/scanner.log` |
| Reachability | `/var/log/stella/reachability.log` |
| Proofs | `/var/log/stella/proofs.log` |
| CLI | `~/.stella/logs/cli.log` |
### Support Channels
- Documentation: `docs/` directory
- Issues: Internal issue tracker
- Emergency: On-call security team
---
## Related Documentation
- [Score Proofs Runbook](../operations/score-proofs-runbook.md)
- [Reachability Runbook](../operations/reachability-runbook.md)
- [Unknowns Queue Runbook](../operations/unknowns-queue-runbook.md)
- [Air-Gap Runbook](../airgap/score-proofs-reachability-airgap-runbook.md)
---
**Last Updated**: 2025-12-20
**Version**: 1.0.0
**Sprint**: 3500.0004.0004