14 KiB
Stella Ops Doctor
Self-service diagnostics for Stella Ops deployments
Overview
The Doctor system provides comprehensive diagnostics for Stella Ops deployments, enabling operators, DevOps engineers, and developers to:
- Diagnose what is working and what is not
- Understand why failures occur with collected evidence
- Remediate issues with copy/paste commands
- Verify fixes with re-runnable checks
Quick Start
CLI
# Quick health check
stella doctor
# Full diagnostic with all checks
stella doctor --full
# Check specific category
stella doctor --category database
# Export report for support
stella doctor export --output diagnostic-bundle.zip
# Apply safe fixes from a report (dry-run by default)
stella doctor fix --from doctor-report.json --apply
UI
Navigate to /ops/doctor in the Stella Ops console to access the interactive Doctor Dashboard.
Fix actions are exposed in the UI and mirror CLI commands; destructive steps are never executed by Doctor.
API
# Run diagnostics
POST /api/v1/doctor/run
# Get available checks
GET /api/v1/doctor/checks
# Stream results
WebSocket /api/v1/doctor/stream
Available Checks
The Doctor system includes 60+ diagnostic checks across 10 plugins:
| Plugin | Category | Checks | Description |
|---|---|---|---|
stellaops.doctor.core |
Core | 9 | Configuration, runtime, disk, memory, time, crypto |
stellaops.doctor.database |
Database | 8 | Connectivity, migrations, schema, connection pool |
stellaops.doctor.servicegraph |
ServiceGraph | 6 | Gateway, routing, service health |
stellaops.doctor.security |
Security | 9 | OIDC, LDAP, TLS, Vault |
stellaops.doctor.attestation |
Security | 4 | Rekor connectivity, Cosign keys, clock skew, offline bundle |
stellaops.doctor.verification |
Security | 5 | Artifact pull, signatures, SBOM, VEX, policy engine |
stellaops.doctor.scm.* |
Integration.SCM | 8 | GitHub, GitLab connectivity/auth/permissions |
stellaops.doctor.registry.* |
Integration.Registry | 6 | Harbor, ECR connectivity/auth/pull |
stellaops.doctor.observability |
Observability | 4 | OTLP, logs, metrics |
stellaops.doctor.timestamping |
Security | 22 | RFC-3161 and eIDAS timestamping health |
Setup Wizard Essential Checks
The following checks are mandatory for the setup wizard to validate a new installation:
-
DB connectivity + schema version (
stellaops.doctor.database)check.db.connection- Database is reachablecheck.db.schema.version- Schema version matches expected
-
Attestation store availability (
stellaops.doctor.attestation)check.attestation.rekor.connectivity- Rekor transparency log reachablecheck.attestation.cosign.keymaterial- Signing keys available (file/KMS/keyless)check.attestation.clock.skew- System clock synchronized (<5s skew)
-
Artifact verification pipeline (
stellaops.doctor.verification)check.verification.artifact.pull- Test artifact accessible by digestcheck.verification.signature- DSSE signatures verifiablecheck.verification.sbom.validation- SBOM (CycloneDX/SPDX) validcheck.verification.vex.validation- VEX document validcheck.verification.policy.engine- Policy evaluation passes
Check ID Convention
check.{category}.{subcategory}.{specific}
Examples:
check.config.requiredcheck.database.migrations.pendingcheck.services.gateway.routingcheck.integration.scm.github.authcheck.attestation.rekor.connectivitycheck.verification.sbom.validation
CLI Reference
See CLI Reference for complete command documentation.
Common Commands
# Quick health check (tagged 'quick' checks only)
stella doctor --quick
# Full diagnostic with all checks
stella doctor --full
# Filter by category
stella doctor --category database
stella doctor --category security
# Filter by plugin
stella doctor --plugin scm.github
# Run single check
stella doctor --check check.database.migrations.pending
# Output formats
stella doctor --format json
stella doctor --format markdown
stella doctor --format text
# Filter output by severity
stella doctor --severity fail,warn
# Export diagnostic bundle
stella doctor export --output diagnostic.zip
stella doctor export --include-logs --log-duration 4h
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All checks passed |
| 1 | One or more warnings |
| 2 | One or more failures |
| 3 | Doctor engine error |
| 4 | Invalid arguments |
| 5 | Timeout exceeded |
Output Example
Stella Ops Doctor
=================
Running 47 checks across 8 plugins...
[PASS] check.config.required
All required configuration values are present
[PASS] check.database.connectivity
PostgreSQL connection successful (latency: 12ms)
[WARN] check.tls.certificates.expiry
Diagnosis: TLS certificate expires in 14 days
Evidence:
Certificate: /etc/ssl/certs/stellaops.crt
Subject: CN=stellaops.example.com
Expires: 2026-01-26T00:00:00Z
Days remaining: 14
Likely Causes:
1. Certificate renewal not scheduled
2. ACME/Let's Encrypt automation not configured
Fix Steps:
# 1. Check current certificate
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates
# 2. Renew certificate (if using certbot)
sudo certbot renew --cert-name stellaops.example.com
# 3. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway
Verification:
stella doctor --check check.tls.certificates.expiry
[FAIL] check.database.migrations.pending
Diagnosis: 3 pending release migrations detected in schema 'auth'
Evidence:
Schema: auth
Current version: 099_add_dpop_thumbprints
Pending migrations:
- 100_add_tenant_quotas
- 101_add_audit_retention
- 102_add_session_revocation
Likely Causes:
1. Release migrations not applied before deployment
2. Migration files added after last deployment
Fix Steps:
# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Apply pending release migrations
stella system migrations-run --module Authority --category release
# 3. Verify migrations applied
stella system migrations-status --module Authority
Verification:
stella doctor --check check.database.migrations.pending
--------------------------------------------------------------------------------
Summary: 44 passed, 2 warnings, 1 failed (47 total)
Duration: 8.3s
--------------------------------------------------------------------------------
Export Bundle
The Doctor export feature creates a diagnostic bundle for support escalation:
stella doctor export --output diagnostic-bundle.zip
The bundle contains:
doctor-report.json- Full diagnostic reportdoctor-report.md- Human-readable reportenvironment.json- Environment informationsystem-info.json- System details (OS, runtime, memory)config-sanitized.json- Sanitized configuration (secrets redacted)logs/- Recent log files (optional)README.md- Bundle contents guide
Export Options
# Include logs from last 4 hours
stella doctor export --include-logs --log-duration 4h
# Exclude configuration
stella doctor export --no-config
# Custom output path
stella doctor export --output /tmp/support-bundle.zip
Security
Secret Redaction
All evidence output is sanitized. Sensitive values (passwords, tokens, connection strings) are replaced with ***REDACTED*** in:
- Console output
- JSON exports
- Diagnostic bundles
- Log files
RBAC Permissions
| Scope | Description |
|---|---|
doctor:run |
Execute doctor checks |
doctor:run:full |
Execute all checks including sensitive |
doctor:export |
Export diagnostic reports |
admin:system |
Access system-level checks |
Plugin Development
To create a custom Doctor plugin, implement IDoctorPlugin:
public class MyCustomPlugin : IDoctorPlugin
{
public string PluginId => "stellaops.doctor.custom";
public string DisplayName => "Custom Checks";
public Version Version => new(1, 0, 0);
public DoctorCategory Category => DoctorCategory.Integration;
public bool IsAvailable(IServiceProvider services) => true;
public IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context)
{
return new IDoctorCheck[]
{
new MyCustomCheck()
};
}
public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
=> Task.CompletedTask;
}
Implement checks using IDoctorCheck:
public class MyCustomCheck : IDoctorCheck
{
public string CheckId => "check.custom.mycheck";
public string Name => "My Custom Check";
public string Description => "Validates custom configuration";
public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
public IReadOnlyList<string> Tags => new[] { "custom", "quick" };
public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(2);
public bool CanRun(DoctorPluginContext context) => true;
public async Task<DoctorCheckResult> RunAsync(
DoctorPluginContext context,
CancellationToken ct)
{
// Perform check logic
var isValid = await ValidateAsync(ct);
if (isValid)
{
return DoctorCheckResult.Pass(
checkId: CheckId,
diagnosis: "Custom configuration is valid",
evidence: new Evidence
{
Description = "Validation passed",
Data = new Dictionary<string, string>
{
["validated_at"] = context.TimeProvider.GetUtcNow().ToString("O")
}
});
}
return DoctorCheckResult.Fail(
checkId: CheckId,
diagnosis: "Custom configuration is invalid",
evidence: new Evidence
{
Description = "Validation failed",
Data = new Dictionary<string, string>
{
["error"] = "Configuration file missing"
}
},
remediation: new Remediation
{
Steps = new[]
{
new RemediationStep
{
Order = 1,
Description = "Create configuration file",
Command = "cp /etc/stellaops/custom.yaml.sample /etc/stellaops/custom.yaml",
CommandType = CommandType.Shell
}
}
});
}
}
Register the plugin in DI:
services.AddSingleton<IDoctorPlugin, MyCustomPlugin>();
Architecture
+------------------+ +------------------+ +------------------+
| CLI | | UI | | External |
| stella doctor | | /ops/doctor | | Monitoring |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
v v v
+------------------------------------------------------------------------+
| Doctor API Layer |
| POST /api/v1/doctor/run GET /api/v1/doctor/checks |
| GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Doctor Engine (Core) |
| +------------------+ +------------------+ +------------------+ |
| | Check Registry | | Check Executor | | Report Generator | |
| | - Discovery | | - Parallel exec | | - JSON/MD/Text | |
| | - Filtering | | - Timeout mgmt | | - Remediation | |
| +------------------+ +------------------+ +------------------+ |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Plugin System |
+--------+---------+---------+---------+---------+---------+-------------+
| | | | | |
v v v v v v
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
| Core | | DB & | |Service| | SCM | |Regis-| |Observ-| |Security |
| Plugin | |Migra-| | Graph | |Plugin| | try | |ability| | Plugin |
| | | tions| |Plugin | | | |Plugin| |Plugin | | |
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
Related Documentation
- CLI Reference - Complete CLI command reference
- Doctor Capabilities Specification - Full technical specification
- Plugin Development Guide - Creating custom plugins
Troubleshooting
Doctor Engine Error (Exit Code 3)
If stella doctor returns exit code 3:
- Check the error message for details
- Verify required services are running
- Check connectivity to databases
- Review logs at
/var/log/stellaops/doctor.log
Timeout Exceeded (Exit Code 5)
If checks are timing out:
# Increase per-check timeout
stella doctor --timeout 60s
# Run with reduced parallelism
stella doctor --parallel 2
Checks Not Found
If expected checks are not appearing:
- Verify plugin is registered in DI
- Check
CanRun()returns true for your environment - Review plugin initialization logs