Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,124 @@
---
checkId: check.docker.daemon
plugin: stellaops.doctor.docker
severity: fail
tags: [docker, daemon, container]
---
# Docker Daemon
## What It Checks
Validates that the Docker daemon is running and responsive. The check connects to the Docker daemon (using `Docker:Host` configuration or the platform default) and performs two operations:
1. **Ping**: Sends a ping request to verify the daemon is alive (with a configurable timeout, default 10 seconds via `Docker:TimeoutSeconds`).
2. **Version**: Retrieves version information to confirm the daemon is fully operational.
Evidence collected on success: host address, Docker version, API version, OS, architecture, and kernel version.
On failure, the check distinguishes between:
- **DockerApiException**: The daemon is running but returned an error (reports status code and response body).
- **Connection failure**: Cannot connect to the daemon at all (Docker not installed, not running, or socket inaccessible).
Default Docker host:
- **Linux**: `unix:///var/run/docker.sock`
- **Windows**: `npipe://./pipe/docker_engine`
## Why It Matters
The Docker daemon is the core runtime for all Stella Ops containers. If the daemon is down:
- No containers can start, stop, or restart.
- Health checks for all containerized services fail.
- Image pulls and builds are impossible.
- Docker Compose operations fail entirely.
- The entire Stella Ops platform is offline in container-based deployments.
## Common Causes
- Docker daemon is not running or not accessible
- Docker is not installed on the host
- Docker service crashed or was stopped
- Docker daemon returned an error response (resource exhaustion, configuration error)
- Timeout connecting to the daemon (overloaded host, slow disk)
## How to Fix
### Docker Compose
Check and restart the Docker daemon:
```bash
# Check daemon status
sudo systemctl status docker
# Start the daemon
sudo systemctl start docker
# Enable auto-start on boot
sudo systemctl enable docker
# Verify
docker info
```
If Docker is not installed:
```bash
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
```
### Bare Metal / systemd
```bash
# Check status
sudo systemctl status docker
# View daemon logs
sudo journalctl -u docker --since "10 minutes ago"
# Restart the daemon
sudo systemctl restart docker
# Verify connectivity
docker version
docker info
```
If the daemon crashes repeatedly, check for resource exhaustion:
```bash
# Check disk space (Docker requires space for images/containers)
df -h /var/lib/docker
# Check memory
free -h
# Clean up Docker resources
docker system prune -a
```
### Kubernetes / Helm
On Kubernetes nodes, the container runtime (containerd/CRI-O) replaces Docker daemon. Check the runtime:
```bash
# Check containerd status
sudo systemctl status containerd
# Check CRI-O status
sudo systemctl status crio
# Restart if needed
sudo systemctl restart containerd
```
For Docker Desktop (development):
```bash
# Restart Docker Desktop
# macOS: killall Docker && open -a Docker
# Windows: Restart-Service docker
```
## Verification
```
stella doctor run --check check.docker.daemon
```
## Related Checks
- `check.docker.socket` — verifies the Docker socket exists and has correct permissions
- `check.docker.apiversion` — verifies the Docker API version is compatible
- `check.docker.storage` — verifies Docker storage is healthy (requires running daemon)
- `check.docker.network` — verifies Docker networks are configured (requires running daemon)