Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,71 @@
---
checkId: check.timestamp.tsa.failover-ready
plugin: stellaops.doctor.timestamping
severity: warn
tags: [timestamping, tsa, failover, redundancy]
---
# TSA Failover Readiness
## What It Checks
Confirms that backup TSA endpoints are reachable for failover. The check:
- Fails if no TSA endpoints are configured at all.
- Warns (degraded) if only one endpoint is configured -- failover is not possible with a single endpoint.
- Probes all configured endpoints and counts reachable ones.
- Compares reachable count against `MinHealthyTsas` (default 2).
- Fails or degrades if fewer than the minimum are reachable.
## Why It Matters
TSA providers can experience outages. Without backup endpoints, a single TSA failure blocks all timestamping operations, halting the evidence pipeline and release process. Failover readiness ensures the platform can automatically switch to an alternative TSA without manual intervention.
## Common Causes
- Only one TSA endpoint configured (no backup)
- Backup TSA endpoint down or unreachable
- Network issues to secondary TSA providers
## How to Fix
### Docker Compose
Configure at least two TSA endpoints:
```yaml
environment:
Timestamping__TsaEndpoints__0__Name: "Primary"
Timestamping__TsaEndpoints__0__Url: "https://freetsa.org/tsr"
Timestamping__TsaEndpoints__1__Name: "Backup"
Timestamping__TsaEndpoints__1__Url: "http://timestamp.digicert.com"
Timestamping__MinHealthyTsas: "2"
```
### Bare Metal / systemd
```json
{
"Timestamping": {
"TsaEndpoints": [
{ "Name": "Primary", "Url": "https://freetsa.org/tsr" },
{ "Name": "Backup", "Url": "http://timestamp.digicert.com" }
],
"MinHealthyTsas": 2
}
}
```
### Kubernetes / Helm
```yaml
timestamping:
minHealthyTsas: 2
tsaEndpoints:
- name: "Primary"
url: "https://freetsa.org/tsr"
- name: "Backup"
url: "http://timestamp.digicert.com"
```
## Verification
```
stella doctor run --check check.timestamp.tsa.failover-ready
```
## Related Checks
- `check.timestamp.tsa.reachable` — verifies TSA endpoint reachability
- `check.timestamp.tsa.response-time` — measures TSA response latency