Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,106 @@
---
checkId: check.scanner.sbom
plugin: stellaops.doctor.scanner
severity: warn
tags: [scanner, sbom, cyclonedx, spdx, compliance]
---
# SBOM Generation Health
## What It Checks
Queries the Scanner service at `/api/v1/sbom/stats` and evaluates SBOM generation health:
- **Success rate**: warn when below 95%, fail when below 80%.
- **Validation failures**: any schema validation failures trigger a warning regardless of success rate.
Evidence collected: `total_generated`, `successful_generations`, `failed_generations`, `success_rate`, `format_cyclonedx`, `format_spdx`, `validation_failures`.
The check requires `Scanner:Url` or `Services:Scanner:Url` to be configured.
## Why It Matters
SBOMs are the foundation of the entire Stella Ops security pipeline. Without valid SBOMs, vulnerability scanning produces incomplete results, reachability analysis cannot run, and release gates that require an SBOM attestation will block promotions. Compliance frameworks (e.g., EO 14028, EU CRA) mandate accurate SBOMs for every shipped artifact.
## Common Causes
- Invalid or corrupted source artifacts (truncated layers, missing manifests)
- Parser errors for specific ecosystems (e.g., unsupported lockfile format)
- Memory exhaustion on large monorepo or multi-module projects
- SBOM schema validation failures due to generator version mismatch
- Unsupported container base image format
- Minor parsing issues in transitive dependency resolution
## How to Fix
### Docker Compose
```bash
# View recent SBOM generation failures
docker compose -f docker-compose.stella-ops.yml logs scanner | grep -i "sbom.*fail"
# Restart the scanner to clear any cached bad state
docker compose -f docker-compose.stella-ops.yml restart scanner
# Increase memory limit if OOM is suspected
# In docker-compose.stella-ops.yml:
```
```yaml
services:
scanner:
deploy:
resources:
limits:
memory: 4G
environment:
Scanner__Sbom__ValidationMode: "Strict"
Scanner__Sbom__MaxArtifactSizeMb: "500"
```
### Bare Metal / systemd
```bash
# Check scanner logs for SBOM errors
sudo journalctl -u stellaops-scanner --since "1 hour ago" | grep -i sbom
# Retry failed SBOMs
stella scanner sbom retry --failed
```
Edit `/etc/stellaops/scanner/appsettings.json`:
```json
{
"Sbom": {
"ValidationMode": "Strict",
"MaxArtifactSizeMb": 500
}
}
```
### Kubernetes / Helm
```bash
# Check for OOMKilled scanner pods
kubectl get pods -l app=stellaops-scanner -o wide
kubectl describe pod <scanner-pod> | grep -A 5 "Last State"
# View SBOM-related logs
kubectl logs -l app=stellaops-scanner --tail=200 | grep -i sbom
```
Set in Helm `values.yaml`:
```yaml
scanner:
resources:
limits:
memory: 4Gi
sbom:
validationMode: Strict
maxArtifactSizeMb: 500
```
## Verification
```
stella doctor run --check check.scanner.sbom
```
## Related Checks
- `check.scanner.queue` -- queue backlog can delay SBOM generation
- `check.scanner.witness.graph` -- witness graphs depend on successful SBOM output
- `check.scanner.resources` -- resource exhaustion is a top cause of SBOM failures