Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,75 @@
---
checkId: check.timestamp.timesync.system
plugin: stellaops.doctor.timestamping
severity: fail
tags: [timestamping, timesync, ntp, system]
---
# System Time Synchronization
## What It Checks
Checks that the system clock is synchronized with NTP servers. The check:
- Queries configured NTP servers (defaults to `time.nist.gov` and `pool.ntp.org`) using the NTP protocol (UDP port 123).
- Computes the time skew between the local system clock and each NTP server.
- **Fails** (unhealthy) if skew exceeds the critical threshold (default 5 seconds).
- **Warns** (degraded) if skew exceeds the warning threshold (default 1 second).
- Reports degraded if no NTP servers can be reached.
## Why It Matters
Accurate system time is fundamental to timestamping. Clock skew causes timestamp tokens to have incorrect genTime values, which can invalidate evidence during verification. Large skew can also cause TLS certificate validation failures, authentication token rejection, and incorrect audit log ordering.
## Common Causes
- NTP service not running (chrony, ntpd, systemd-timesyncd)
- NTP servers unreachable (firewall blocking UDP 123)
- Virtual machine time drift (especially paused/resumed VMs)
- Hardware clock issues
## How to Fix
### Docker Compose
Docker containers inherit the host's clock. Fix time sync on the Docker host:
```bash
# Check host time sync
timedatectl status
# Enable NTP sync
sudo timedatectl set-ntp true
# Or configure chrony
sudo systemctl restart chronyd
```
### Bare Metal / systemd
```bash
# Check time sync status
timedatectl status
chronyc tracking
# Force sync
sudo chronyc makestep
# Enable NTP
sudo timedatectl set-ntp true
sudo systemctl enable chronyd
```
### Kubernetes / Helm
Kubernetes nodes must have NTP configured. Verify on each node:
```bash
# On each node
timedatectl status
chronyc tracking
```
Ensure NTP is part of your node provisioning configuration.
## Verification
```
stella doctor run --check check.timestamp.timesync.system
```
## Related Checks
- `check.timestamp.timesync.tsa-skew` — checks skew between system clock and TSA genTime
- `check.timestamp.timesync.rekor-correlation` — checks TST-Rekor time correlation