Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,113 @@
---
checkId: check.scanner.reachability
plugin: stellaops.doctor.scanner
severity: warn
tags: [scanner, reachability, analysis, performance]
---
# Reachability Computation Health
## What It Checks
Queries the Scanner service at `/api/v1/reachability/stats` and evaluates reachability analysis performance and accuracy:
- **Computation failures**: fail if failure rate exceeds 10% of total computations.
- **Average computation time**: warn at 5,000ms, fail at 30,000ms.
- **Vulnerability filtering effectiveness**: reported as evidence (ratio of unreachable to total vulnerabilities).
Evidence collected: `total_computations`, `computation_failures`, `failure_rate`, `avg_computation_time_ms`, `p95_computation_time_ms`, `reachable_vulns`, `unreachable_vulns`, `filter_rate`.
The check requires `Scanner:Url` or `Services:Scanner:Url` to be configured.
## Why It Matters
Reachability analysis is what separates actionable vulnerability findings from noise. It determines which vulnerabilities are actually reachable in the call graph, filtering out false positives that would otherwise block releases or waste triage time. Slow computations delay security feedback loops, and failures mean vulnerabilities are reported without reachability context, inflating finding counts and eroding operator trust.
## Common Causes
- Invalid or incomplete call graph data from the SBOM/slice pipeline
- Missing slice cache entries forcing full recomputation
- Timeout on large codebases with deep dependency trees
- Memory exhaustion during graph traversal on complex projects
- Complex call graphs with high fan-out or cyclical references
- Insufficient CPU/memory allocated to scanner workers
## How to Fix
### Docker Compose
```bash
# Check scanner logs for reachability errors
docker compose -f docker-compose.stella-ops.yml logs scanner | grep -i "reachability\|computation"
# Warm the slice cache to speed up subsequent computations
stella scanner cache warm
# Increase scanner resources
```
```yaml
services:
scanner:
deploy:
resources:
limits:
memory: 4G
cpus: "4.0"
environment:
Scanner__Reachability__TimeoutMs: "60000"
Scanner__Reachability__MaxGraphDepth: "100"
```
### Bare Metal / systemd
```bash
# View reachability computation errors
sudo journalctl -u stellaops-scanner --since "1 hour ago" | grep -i reachability
# Retry failed computations
stella scanner reachability retry --failed
# Warm the slice cache
stella scanner cache warm
```
Edit `/etc/stellaops/scanner/appsettings.json`:
```json
{
"Reachability": {
"TimeoutMs": 60000,
"MaxGraphDepth": 100,
"MaxConcurrentComputations": 4
}
}
```
### Kubernetes / Helm
```bash
# Check scanner pod resource usage
kubectl top pods -l app=stellaops-scanner
# Scale scanner workers for parallel computation
kubectl scale deployment stellaops-scanner --replicas=4
```
Set in Helm `values.yaml`:
```yaml
scanner:
replicas: 4
resources:
limits:
memory: 4Gi
cpu: "4"
reachability:
timeoutMs: 60000
maxGraphDepth: 100
```
## Verification
```
stella doctor run --check check.scanner.reachability
```
## Related Checks
- `check.scanner.slice.cache` -- cache misses are a primary cause of slow computations
- `check.scanner.witness.graph` -- reachability depends on witness graph integrity
- `check.scanner.sbom` -- SBOM quality directly affects reachability accuracy
- `check.scanner.resources` -- resource constraints cause computation timeouts