Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions
--- a/docs/doctor/articles/scanner/reachability.md
+++ b/docs/doctor/articles/scanner/reachability.md
@@ -0,0 +1,113 @@
+---
+checkId: check.scanner.reachability
+plugin: stellaops.doctor.scanner
+severity: warn
+tags: [scanner, reachability, analysis, performance]
+---
+# Reachability Computation Health
+
+## What It Checks
+Queries the Scanner service at `/api/v1/reachability/stats` and evaluates reachability analysis performance and accuracy:
+
+- **Computation failures**: fail if failure rate exceeds 10% of total computations.
+- **Average computation time**: warn at 5,000ms, fail at 30,000ms.
+- **Vulnerability filtering effectiveness**: reported as evidence (ratio of unreachable to total vulnerabilities).
+
+Evidence collected: `total_computations`, `computation_failures`, `failure_rate`, `avg_computation_time_ms`, `p95_computation_time_ms`, `reachable_vulns`, `unreachable_vulns`, `filter_rate`.
+
+The check requires `Scanner:Url` or `Services:Scanner:Url` to be configured.
+
+## Why It Matters
+Reachability analysis is what separates actionable vulnerability findings from noise. It determines which vulnerabilities are actually reachable in the call graph, filtering out false positives that would otherwise block releases or waste triage time. Slow computations delay security feedback loops, and failures mean vulnerabilities are reported without reachability context, inflating finding counts and eroding operator trust.
+
+## Common Causes
+- Invalid or incomplete call graph data from the SBOM/slice pipeline
+- Missing slice cache entries forcing full recomputation
+- Timeout on large codebases with deep dependency trees
+- Memory exhaustion during graph traversal on complex projects
+- Complex call graphs with high fan-out or cyclical references
+- Insufficient CPU/memory allocated to scanner workers
+
+## How to Fix
+
+### Docker Compose
+```bash
+# Check scanner logs for reachability errors
+docker compose -f docker-compose.stella-ops.yml logs scanner | grep -i "reachability\|computation"
+
+# Warm the slice cache to speed up subsequent computations
+stella scanner cache warm
+
+# Increase scanner resources
+```
+
+```yaml
+services:
+  scanner:
+    deploy:
+      resources:
+        limits:
+          memory: 4G
+          cpus: "4.0"
+    environment:
+      Scanner__Reachability__TimeoutMs: "60000"
+      Scanner__Reachability__MaxGraphDepth: "100"
+```
+
+### Bare Metal / systemd
+```bash
+# View reachability computation errors
+sudo journalctl -u stellaops-scanner --since "1 hour ago" | grep -i reachability
+
+# Retry failed computations
+stella scanner reachability retry --failed
+
+# Warm the slice cache
+stella scanner cache warm
+```
+
+Edit `/etc/stellaops/scanner/appsettings.json`:
+
+```json
+{
+  "Reachability": {
+    "TimeoutMs": 60000,
+    "MaxGraphDepth": 100,
+    "MaxConcurrentComputations": 4
+  }
+}
+```
+
+### Kubernetes / Helm
+```bash
+# Check scanner pod resource usage
+kubectl top pods -l app=stellaops-scanner
+
+# Scale scanner workers for parallel computation
+kubectl scale deployment stellaops-scanner --replicas=4
+```
+
+Set in Helm `values.yaml`:
+
+```yaml
+scanner:
+  replicas: 4
+  resources:
+    limits:
+      memory: 4Gi
+      cpu: "4"
+  reachability:
+    timeoutMs: 60000
+    maxGraphDepth: 100
+```
+
+## Verification
+```
+stella doctor run --check check.scanner.reachability
+```
+
+## Related Checks
+- `check.scanner.slice.cache` -- cache misses are a primary cause of slow computations
+- `check.scanner.witness.graph` -- reachability depends on witness graph integrity
+- `check.scanner.sbom` -- SBOM quality directly affects reachability accuracy
+- `check.scanner.resources` -- resource constraints cause computation timeouts