Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stella Ops Doctor
Self-service diagnostics for Stella Ops deployments
Overview
The Doctor system provides comprehensive diagnostics for Stella Ops deployments, enabling operators, DevOps engineers, and developers to:
- Diagnose what is working and what is not
- Understand why failures occur with collected evidence
- Remediate issues with copy/paste commands
- Verify fixes with re-runnable checks
Quick Start
CLI
# Quick health check
stella doctor
# Full diagnostic with all checks
stella doctor --full
# Check specific category
stella doctor --category database
# Export report for support
stella doctor export --output diagnostic-bundle.zip
# Apply safe fixes from a report (dry-run by default)
stella doctor fix --from doctor-report.json --apply
UI
Navigate to /ops/doctor in the Stella Ops console to access the interactive Doctor Dashboard.
Fix actions are exposed in the UI and mirror CLI commands; destructive steps are never executed by Doctor.
API
# Run diagnostics
POST /api/v1/doctor/run
# Generate AI-assisted diagnosis from Doctor report payloads or stored runs
POST /api/v1/doctor/diagnosis
# Get available checks
GET /api/v1/doctor/checks
# Stream results
WebSocket /api/v1/doctor/stream
# Manage scheduled doctor runs
GET/POST /api/v1/doctor/scheduler/schedules
PUT/DELETE /api/v1/doctor/scheduler/schedules/{scheduleId}
# Query scheduler trend data
GET /api/v1/doctor/scheduler/trends
GET /api/v1/doctor/scheduler/trends/checks/{checkId}
GET /api/v1/doctor/scheduler/trends/degrading
Available Checks
The Doctor system includes 60+ diagnostic checks across 10 plugins:
| Plugin | Category | Checks | Description |
|---|---|---|---|
stellaops.doctor.core |
Core | 9 | Configuration, runtime, disk, memory, time, crypto |
stellaops.doctor.database |
Database | 8 | Connectivity, migrations, schema, connection pool |
stellaops.doctor.servicegraph |
ServiceGraph | 6 | Gateway, routing, service health |
stellaops.doctor.security |
Security | 9 | OIDC, LDAP, TLS, Vault |
stellaops.doctor.attestation |
Security | 4 | Rekor connectivity, Cosign keys, clock skew, offline bundle |
stellaops.doctor.verification |
Security | 5 | Artifact pull, signatures, SBOM, VEX, policy engine |
stellaops.doctor.scm.* |
Integration.SCM | 8 | GitHub, GitLab connectivity/auth/permissions |
stellaops.doctor.registry.* |
Integration.Registry | 6 | Harbor, ECR connectivity/auth/pull |
stellaops.doctor.observability |
Observability | 4 | OTLP, logs, metrics |
stellaops.doctor.timestamping |
Security | 22 | RFC-3161 and eIDAS timestamping health |
Setup Wizard Essential Checks
The following checks are mandatory for the setup wizard to validate a new installation:
-
DB connectivity + schema version (
stellaops.doctor.database)check.db.connection- Database is reachablecheck.db.schema.version- Schema version matches expected
-
Attestation store availability (
stellaops.doctor.attestation)check.attestation.rekor.connectivity- Rekor transparency log reachablecheck.attestation.cosign.keymaterial- Signing keys available (file/KMS/keyless)check.attestation.clock.skew- System clock synchronized (<5s skew)
-
Artifact verification pipeline (
stellaops.doctor.verification)check.verification.artifact.pull- Test artifact accessible by digestcheck.verification.signature- DSSE signatures verifiablecheck.verification.sbom.validation- SBOM (CycloneDX/SPDX) validcheck.verification.vex.validation- VEX document validcheck.verification.policy.engine- Policy evaluation passes
Check ID Convention
check.{category}.{subcategory}.{specific}
Examples:
check.config.requiredcheck.database.migrations.pendingcheck.services.gateway.routingcheck.integration.scm.github.authcheck.attestation.rekor.connectivitycheck.verification.sbom.validation
CLI Reference
See CLI Reference for complete command documentation.
Common Commands
# Quick health check (tagged 'quick' checks only)
stella doctor --quick
# Full diagnostic with all checks
stella doctor --full
# Filter by category
stella doctor --category database
stella doctor --category security
# Filter by plugin
stella doctor --plugin scm.github
# Run single check
stella doctor --check check.database.migrations.pending
# Output formats
stella doctor --format json
stella doctor --format markdown
stella doctor --format text
# Filter output by severity
stella doctor --severity fail,warn
# Export diagnostic bundle
stella doctor export --output diagnostic.zip
stella doctor export --include-logs --log-duration 4h
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All checks passed |
| 1 | One or more warnings |
| 2 | One or more failures |
| 3 | Doctor engine error |
| 4 | Invalid arguments |
| 5 | Timeout exceeded |
Output Example
Stella Ops Doctor
=================
Running 47 checks across 8 plugins...
[PASS] check.config.required
All required configuration values are present
[PASS] check.database.connectivity
PostgreSQL connection successful (latency: 12ms)
[WARN] check.tls.certificates.expiry
Diagnosis: TLS certificate expires in 14 days
Evidence:
Certificate: /etc/ssl/certs/stellaops.crt
Subject: CN=stellaops.example.com
Expires: 2026-01-26T00:00:00Z
Days remaining: 14
Likely Causes:
1. Certificate renewal not scheduled
2. ACME/Let's Encrypt automation not configured
Fix Steps:
# 1. Check current certificate
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates
# 2. Renew certificate (if using certbot)
sudo certbot renew --cert-name stellaops.example.com
# 3. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway
Runbook:
https://docs.stella-ops.org/runbooks/tls-certificate-renewal
Verification:
stella doctor --check check.tls.certificates.expiry
[FAIL] check.database.migrations.pending
Diagnosis: 3 pending release migrations detected in schema 'auth'
Evidence:
Schema: auth
Current version: 099_add_dpop_thumbprints
Pending migrations:
- 100_add_tenant_quotas
- 101_add_audit_retention
- 102_add_session_revocation
Likely Causes:
1. Release migrations not applied before deployment
2. Migration files added after last deployment
Fix Steps:
# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Apply pending release migrations
stella system migrations-run --module Authority --category release
# 3. Verify migrations applied
stella system migrations-status --module Authority
Verification:
stella doctor --check check.database.migrations.pending
--------------------------------------------------------------------------------
Summary: 44 passed, 2 warnings, 1 failed (47 total)
Duration: 8.3s
--------------------------------------------------------------------------------
Export Bundle
The Doctor export feature creates a diagnostic bundle for support escalation:
stella doctor export --output diagnostic-bundle.zip
The bundle contains:
doctor-report.json- Full diagnostic reportdoctor-report.md- Human-readable reportenvironment.json- Environment informationsystem-info.json- System details (OS, runtime, memory)config-sanitized.json- Sanitized configuration (secrets redacted)logs/- Recent log files (optional)README.md- Bundle contents guide
Export Options
# Include logs from last 4 hours
stella doctor export --include-logs --log-duration 4h
# Exclude configuration
stella doctor export --no-config
# Custom output path
stella doctor export --output /tmp/support-bundle.zip
Security
Secret Redaction
All evidence output is sanitized. Sensitive values (passwords, tokens, connection strings) are replaced with ***REDACTED*** in:
- Console output
- JSON exports
- Diagnostic bundles
- Log files
RBAC Permissions
| Scope | Description |
|---|---|
doctor:run |
Execute doctor checks |
doctor:run:full |
Execute all checks including sensitive |
doctor:export |
Export diagnostic reports |
admin:system |
Access system-level checks |
Plugin Development
To create a custom Doctor plugin, implement IDoctorPlugin:
public class MyCustomPlugin : IDoctorPlugin
{
public string PluginId => "stellaops.doctor.custom";
public string DisplayName => "Custom Checks";
public Version Version => new(1, 0, 0);
public DoctorCategory Category => DoctorCategory.Integration;
public bool IsAvailable(IServiceProvider services) => true;
public IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context)
{
return new IDoctorCheck[]
{
new MyCustomCheck()
};
}
public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
=> Task.CompletedTask;
}
Implement checks using IDoctorCheck:
public class MyCustomCheck : IDoctorCheck
{
public string CheckId => "check.custom.mycheck";
public string Name => "My Custom Check";
public string Description => "Validates custom configuration";
public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
public IReadOnlyList<string> Tags => new[] { "custom", "quick" };
public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(2);
public bool CanRun(DoctorPluginContext context) => true;
public async Task<DoctorCheckResult> RunAsync(
DoctorPluginContext context,
CancellationToken ct)
{
// Perform check logic
var isValid = await ValidateAsync(ct);
if (isValid)
{
return DoctorCheckResult.Pass(
checkId: CheckId,
diagnosis: "Custom configuration is valid",
evidence: new Evidence
{
Description = "Validation passed",
Data = new Dictionary<string, string>
{
["validated_at"] = context.TimeProvider.GetUtcNow().ToString("O")
}
});
}
return DoctorCheckResult.Fail(
checkId: CheckId,
diagnosis: "Custom configuration is invalid",
evidence: new Evidence
{
Description = "Validation failed",
Data = new Dictionary<string, string>
{
["error"] = "Configuration file missing"
}
},
remediation: new Remediation
{
Steps = new[]
{
new RemediationStep
{
Order = 1,
Description = "Create configuration file",
Command = "cp /etc/stellaops/custom.yaml.sample /etc/stellaops/custom.yaml",
CommandType = CommandType.Shell
}
}
});
}
}
Register the plugin in DI:
services.AddSingleton<IDoctorPlugin, MyCustomPlugin>();
Architecture
+------------------+ +------------------+ +------------------+
| CLI | | UI | | External |
| stella doctor | | /ops/doctor | | Monitoring |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
v v v
+------------------------------------------------------------------------+
| Doctor API Layer |
| POST /api/v1/doctor/run GET /api/v1/doctor/checks |
| GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Doctor Engine (Core) |
| +------------------+ +------------------+ +------------------+ |
| | Check Registry | | Check Executor | | Report Generator | |
| | - Discovery | | - Parallel exec | | - JSON/MD/Text | |
| | - Filtering | | - Timeout mgmt | | - Remediation | |
| +------------------+ +------------------+ +------------------+ |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Plugin System |
+--------+---------+---------+---------+---------+---------+-------------+
| | | | | |
v v v v v v
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
| Core | | DB & | |Service| | SCM | |Regis-| |Observ-| |Security |
| Plugin | |Migra-| | Graph | |Plugin| | try | |ability| | Plugin |
| | | tions| |Plugin | | | |Plugin| |Plugin | | |
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
Related Documentation
- CLI Reference - Complete CLI command reference
- Doctor Capabilities Specification - Full technical specification
- Plugin Development Guide - Creating custom plugins
Troubleshooting
Doctor Engine Error (Exit Code 3)
If stella doctor returns exit code 3:
- Check the error message for details
- Verify required services are running
- Check connectivity to databases
- Review logs at
/var/log/stellaops/doctor.log
Timeout Exceeded (Exit Code 5)
If checks are timing out:
# Increase per-check timeout
stella doctor --timeout 60s
# Run with reduced parallelism
stella doctor --parallel 2
Checks Not Found
If expected checks are not appearing:
- Verify plugin is registered in DI
- Check
CanRun()returns true for your environment - Review plugin initialization logs