Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,88 @@
---
checkId: check.release.schedule
plugin: stellaops.doctor.release
severity: info
tags: [release, schedule, upcoming, planning]
---
# Release Schedule Health
## What It Checks
Queries the Release Orchestrator at `/api/v1/releases/scheduled` and evaluates the health of scheduled releases:
- **Missed schedules**: fail if any scheduled release with status "pending" has a scheduled time in the past.
- **Schedule conflicts**: warn if two pending releases target the same environment within 1 hour of each other.
- **Upcoming releases**: informational -- reports releases scheduled within the next 24 hours.
Evidence collected: `scheduled_release_count`, `upcoming_24h_count`, `missed_schedule_count`, `conflict_count`, `missed_releases`, `conflicts`, `upcoming_releases`.
The check requires `ReleaseOrchestrator:Url` or `Release:Orchestrator:Url` to be configured.
## Why It Matters
Missed scheduled releases indicate that the release scheduler is not functioning or that prerequisites were not met at the scheduled time. This can delay time-critical deployments such as security patches or compliance deadlines. Schedule conflicts can cause deployment failures when two releases compete for the same environment simultaneously, potentially leaving the environment in an inconsistent state.
## Common Causes
- Release scheduler service not running or crashed
- Prerequisite conditions (policy gates, approvals) not met at scheduled time
- Target environment was unavailable when the schedule triggered
- Multiple teams scheduling releases to the same environment without coordination
- Manual schedule override without checking for existing schedules
- Clock skew between scheduler and orchestrator services
## How to Fix
### Docker Compose
```bash
# View missed schedules
stella release schedule list --missed
# Run a missed release immediately
stella release schedule run <schedule-id>
# View schedule conflicts
stella release schedule list --conflicts
# Reschedule a conflicting release
stella release schedule update <schedule-id> --time "2026-03-27T14:00:00Z"
# Check scheduler service
docker compose -f docker-compose.stella-ops.yml logs --tail 100 orchestrator | grep -i schedule
```
### Bare Metal / systemd
```bash
# Check scheduler status
stella release schedule status
# List missed and conflicting schedules
stella release schedule list --missed
stella release schedule list --conflicts
# Reschedule
stella release schedule update <schedule-id> --time "2026-03-27T14:00:00Z"
# Check system clock synchronization
timedatectl status
```
### Kubernetes / Helm
```bash
# Check orchestrator pod time synchronization
kubectl exec -it <orchestrator-pod> -- date -u
# View scheduled releases
kubectl exec -it <orchestrator-pod> -- stella release schedule list
# Check for CronJob issues
kubectl get cronjobs -l app=stellaops-release-scheduler
kubectl describe cronjob stellaops-release-scheduler
```
## Verification
```
stella doctor run --check check.release.schedule
```
## Related Checks
- `check.release.active` -- missed schedules may result in delayed active releases
- `check.release.environment.readiness` -- environment availability affects schedule execution
- `check.operations.scheduler` -- platform scheduler health affects release scheduling