Files

master c58a236d70 Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-27 12:28:00 +02:00

2.3 KiB

Raw Blame History

checkId, plugin, severity, tags

checkId

plugin

severity

Disk Space Availability

What It Checks

Verifies disk space availability on drives used by Stella Ops. The check:

Identifies paths to check from Storage:DataPath, EvidenceLocker:Path, Backup:Path, and Logging:Path configuration (falls back to platform defaults: /var/lib/stellaops on Linux, %ProgramData%\StellaOps on Windows).
Gets the drive info for each path and calculates usage ratio.
Fails at 90%+ usage (critical threshold) -- the system is at immediate risk of running out of space.
Warns at 80%+ usage (warning threshold) -- approaching capacity.
Reports the most critically used drive.

Why It Matters

Disk exhaustion causes cascading failures: database writes fail, evidence cannot be stored, log rotation breaks, and container operations halt. This is a severity-fail check because disk exhaustion can cause data loss and service outages that are difficult to recover from.

Common Causes

Log files accumulating without rotation
Evidence artifacts consuming space
Backup files not rotated or pruned
Large container images cached on disk
Normal data growth approaching provisioned capacity

How to Fix

Docker Compose

# Check disk usage
docker exec <platform-container> df -h

# Cleanup old logs
stella storage cleanup --logs --older-than 7d

# Prune Docker resources
docker system prune -a
docker volume prune

Bare Metal / systemd

# Find large files
du -sh /var/lib/stellaops/* | sort -rh | head -20

# Cleanup logs
stella storage cleanup --logs --older-than 7d

# Cleanup temporary files
stella storage cleanup --temp

# Review Docker disk usage
docker system df

Kubernetes / Helm

# Check PV usage
kubectl get pv
kubectl exec -it <platform-pod> -- df -h

# Expand PVC if needed
kubectl edit pvc stellaops-data  # increase storage request

Consider setting up automated cleanup policies:

storage:
  cleanup:
    enabled: true
    logRetentionDays: 30
    tempCleanupSchedule: "0 4 * * *"

Verification

stella doctor run --check check.storage.diskspace

check.storage.backup — verifies backup directory accessibility
check.storage.evidencelocker — verifies evidence locker write access

2.3 KiB Raw Blame History