Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
124
docs/doctor/articles/docker/daemon.md
Normal file
124
docs/doctor/articles/docker/daemon.md
Normal file
@@ -0,0 +1,124 @@
|
||||
---
|
||||
checkId: check.docker.daemon
|
||||
plugin: stellaops.doctor.docker
|
||||
severity: fail
|
||||
tags: [docker, daemon, container]
|
||||
---
|
||||
# Docker Daemon
|
||||
|
||||
## What It Checks
|
||||
Validates that the Docker daemon is running and responsive. The check connects to the Docker daemon (using `Docker:Host` configuration or the platform default) and performs two operations:
|
||||
|
||||
1. **Ping**: Sends a ping request to verify the daemon is alive (with a configurable timeout, default 10 seconds via `Docker:TimeoutSeconds`).
|
||||
2. **Version**: Retrieves version information to confirm the daemon is fully operational.
|
||||
|
||||
Evidence collected on success: host address, Docker version, API version, OS, architecture, and kernel version.
|
||||
|
||||
On failure, the check distinguishes between:
|
||||
- **DockerApiException**: The daemon is running but returned an error (reports status code and response body).
|
||||
- **Connection failure**: Cannot connect to the daemon at all (Docker not installed, not running, or socket inaccessible).
|
||||
|
||||
Default Docker host:
|
||||
- **Linux**: `unix:///var/run/docker.sock`
|
||||
- **Windows**: `npipe://./pipe/docker_engine`
|
||||
|
||||
## Why It Matters
|
||||
The Docker daemon is the core runtime for all Stella Ops containers. If the daemon is down:
|
||||
|
||||
- No containers can start, stop, or restart.
|
||||
- Health checks for all containerized services fail.
|
||||
- Image pulls and builds are impossible.
|
||||
- Docker Compose operations fail entirely.
|
||||
- The entire Stella Ops platform is offline in container-based deployments.
|
||||
|
||||
## Common Causes
|
||||
- Docker daemon is not running or not accessible
|
||||
- Docker is not installed on the host
|
||||
- Docker service crashed or was stopped
|
||||
- Docker daemon returned an error response (resource exhaustion, configuration error)
|
||||
- Timeout connecting to the daemon (overloaded host, slow disk)
|
||||
|
||||
## How to Fix
|
||||
|
||||
### Docker Compose
|
||||
Check and restart the Docker daemon:
|
||||
|
||||
```bash
|
||||
# Check daemon status
|
||||
sudo systemctl status docker
|
||||
|
||||
# Start the daemon
|
||||
sudo systemctl start docker
|
||||
|
||||
# Enable auto-start on boot
|
||||
sudo systemctl enable docker
|
||||
|
||||
# Verify
|
||||
docker info
|
||||
```
|
||||
|
||||
If Docker is not installed:
|
||||
```bash
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
sudo usermod -aG docker $USER
|
||||
```
|
||||
|
||||
### Bare Metal / systemd
|
||||
```bash
|
||||
# Check status
|
||||
sudo systemctl status docker
|
||||
|
||||
# View daemon logs
|
||||
sudo journalctl -u docker --since "10 minutes ago"
|
||||
|
||||
# Restart the daemon
|
||||
sudo systemctl restart docker
|
||||
|
||||
# Verify connectivity
|
||||
docker version
|
||||
docker info
|
||||
```
|
||||
|
||||
If the daemon crashes repeatedly, check for resource exhaustion:
|
||||
```bash
|
||||
# Check disk space (Docker requires space for images/containers)
|
||||
df -h /var/lib/docker
|
||||
|
||||
# Check memory
|
||||
free -h
|
||||
|
||||
# Clean up Docker resources
|
||||
docker system prune -a
|
||||
```
|
||||
|
||||
### Kubernetes / Helm
|
||||
On Kubernetes nodes, the container runtime (containerd/CRI-O) replaces Docker daemon. Check the runtime:
|
||||
|
||||
```bash
|
||||
# Check containerd status
|
||||
sudo systemctl status containerd
|
||||
|
||||
# Check CRI-O status
|
||||
sudo systemctl status crio
|
||||
|
||||
# Restart if needed
|
||||
sudo systemctl restart containerd
|
||||
```
|
||||
|
||||
For Docker Desktop (development):
|
||||
```bash
|
||||
# Restart Docker Desktop
|
||||
# macOS: killall Docker && open -a Docker
|
||||
# Windows: Restart-Service docker
|
||||
```
|
||||
|
||||
## Verification
|
||||
```
|
||||
stella doctor run --check check.docker.daemon
|
||||
```
|
||||
|
||||
## Related Checks
|
||||
- `check.docker.socket` — verifies the Docker socket exists and has correct permissions
|
||||
- `check.docker.apiversion` — verifies the Docker API version is compatible
|
||||
- `check.docker.storage` — verifies Docker storage is healthy (requires running daemon)
|
||||
- `check.docker.network` — verifies Docker networks are configured (requires running daemon)
|
||||
Reference in New Issue
Block a user