Files
git.stella-ops.org/docs/doctor

Stella Ops Doctor

Self-service diagnostics for Stella Ops deployments

Overview

The Doctor system provides comprehensive diagnostics for Stella Ops deployments, enabling operators, DevOps engineers, and developers to:

  • Diagnose what is working and what is not
  • Understand why failures occur with collected evidence
  • Remediate issues with copy/paste commands
  • Verify fixes with re-runnable checks

Quick Start

CLI

# Quick health check
stella doctor

# Full diagnostic with all checks
stella doctor --full

# Check specific category
stella doctor --category database

# Export report for support
stella doctor export --output diagnostic-bundle.zip

# Apply safe fixes from a report (dry-run by default)
stella doctor fix --from doctor-report.json --apply

UI

Navigate to /ops/doctor in the Stella Ops console to access the interactive Doctor Dashboard. Fix actions are exposed in the UI and mirror CLI commands; destructive steps are never executed by Doctor.

API

# Run diagnostics
POST /api/v1/doctor/run

# Get available checks
GET /api/v1/doctor/checks

# Stream results
WebSocket /api/v1/doctor/stream

Available Checks

The Doctor system includes 60+ diagnostic checks across 10 plugins:

Plugin Category Checks Description
stellaops.doctor.core Core 9 Configuration, runtime, disk, memory, time, crypto
stellaops.doctor.database Database 8 Connectivity, migrations, schema, connection pool
stellaops.doctor.servicegraph ServiceGraph 6 Gateway, routing, service health
stellaops.doctor.security Security 9 OIDC, LDAP, TLS, Vault
stellaops.doctor.attestation Security 4 Rekor connectivity, Cosign keys, clock skew, offline bundle
stellaops.doctor.verification Security 5 Artifact pull, signatures, SBOM, VEX, policy engine
stellaops.doctor.scm.* Integration.SCM 8 GitHub, GitLab connectivity/auth/permissions
stellaops.doctor.registry.* Integration.Registry 6 Harbor, ECR connectivity/auth/pull
stellaops.doctor.observability Observability 4 OTLP, logs, metrics
stellaops.doctor.timestamping Security 22 RFC-3161 and eIDAS timestamping health

Setup Wizard Essential Checks

The following checks are mandatory for the setup wizard to validate a new installation:

  1. DB connectivity + schema version (stellaops.doctor.database)

    • check.db.connection - Database is reachable
    • check.db.schema.version - Schema version matches expected
  2. Attestation store availability (stellaops.doctor.attestation)

    • check.attestation.rekor.connectivity - Rekor transparency log reachable
    • check.attestation.cosign.keymaterial - Signing keys available (file/KMS/keyless)
    • check.attestation.clock.skew - System clock synchronized (<5s skew)
  3. Artifact verification pipeline (stellaops.doctor.verification)

    • check.verification.artifact.pull - Test artifact accessible by digest
    • check.verification.signature - DSSE signatures verifiable
    • check.verification.sbom.validation - SBOM (CycloneDX/SPDX) valid
    • check.verification.vex.validation - VEX document valid
    • check.verification.policy.engine - Policy evaluation passes

Check ID Convention

check.{category}.{subcategory}.{specific}

Examples:

  • check.config.required
  • check.database.migrations.pending
  • check.services.gateway.routing
  • check.integration.scm.github.auth
  • check.attestation.rekor.connectivity
  • check.verification.sbom.validation

CLI Reference

See CLI Reference for complete command documentation.

Common Commands

# Quick health check (tagged 'quick' checks only)
stella doctor --quick

# Full diagnostic with all checks
stella doctor --full

# Filter by category
stella doctor --category database
stella doctor --category security

# Filter by plugin
stella doctor --plugin scm.github

# Run single check
stella doctor --check check.database.migrations.pending

# Output formats
stella doctor --format json
stella doctor --format markdown
stella doctor --format text

# Filter output by severity
stella doctor --severity fail,warn

# Export diagnostic bundle
stella doctor export --output diagnostic.zip
stella doctor export --include-logs --log-duration 4h

Exit Codes

Code Meaning
0 All checks passed
1 One or more warnings
2 One or more failures
3 Doctor engine error
4 Invalid arguments
5 Timeout exceeded

Output Example

Stella Ops Doctor
=================

Running 47 checks across 8 plugins...

[PASS] check.config.required
  All required configuration values are present

[PASS] check.database.connectivity
  PostgreSQL connection successful (latency: 12ms)

[WARN] check.tls.certificates.expiry
  Diagnosis: TLS certificate expires in 14 days

  Evidence:
    Certificate: /etc/ssl/certs/stellaops.crt
    Subject: CN=stellaops.example.com
    Expires: 2026-01-26T00:00:00Z
    Days remaining: 14

  Likely Causes:
    1. Certificate renewal not scheduled
    2. ACME/Let's Encrypt automation not configured

  Fix Steps:
    # 1. Check current certificate
    openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates

    # 2. Renew certificate (if using certbot)
    sudo certbot renew --cert-name stellaops.example.com

    # 3. Restart services to pick up new certificate
    sudo systemctl restart stellaops-gateway

  Verification:
    stella doctor --check check.tls.certificates.expiry

[FAIL] check.database.migrations.pending
  Diagnosis: 3 pending release migrations detected in schema 'auth'

  Evidence:
    Schema: auth
    Current version: 099_add_dpop_thumbprints
    Pending migrations:
      - 100_add_tenant_quotas
      - 101_add_audit_retention
      - 102_add_session_revocation

  Likely Causes:
    1. Release migrations not applied before deployment
    2. Migration files added after last deployment

  Fix Steps:
    # 1. Backup database first (RECOMMENDED)
    pg_dump -h localhost -U stella_admin -d stellaops -F c \
      -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump

    # 2. Apply pending release migrations
    stella system migrations-run --module Authority --category release

    # 3. Verify migrations applied
    stella system migrations-status --module Authority

  Verification:
    stella doctor --check check.database.migrations.pending

--------------------------------------------------------------------------------
Summary: 44 passed, 2 warnings, 1 failed (47 total)
Duration: 8.3s
--------------------------------------------------------------------------------

Export Bundle

The Doctor export feature creates a diagnostic bundle for support escalation:

stella doctor export --output diagnostic-bundle.zip

The bundle contains:

  • doctor-report.json - Full diagnostic report
  • doctor-report.md - Human-readable report
  • environment.json - Environment information
  • system-info.json - System details (OS, runtime, memory)
  • config-sanitized.json - Sanitized configuration (secrets redacted)
  • logs/ - Recent log files (optional)
  • README.md - Bundle contents guide

Export Options

# Include logs from last 4 hours
stella doctor export --include-logs --log-duration 4h

# Exclude configuration
stella doctor export --no-config

# Custom output path
stella doctor export --output /tmp/support-bundle.zip

Security

Secret Redaction

All evidence output is sanitized. Sensitive values (passwords, tokens, connection strings) are replaced with ***REDACTED*** in:

  • Console output
  • JSON exports
  • Diagnostic bundles
  • Log files

RBAC Permissions

Scope Description
doctor:run Execute doctor checks
doctor:run:full Execute all checks including sensitive
doctor:export Export diagnostic reports
admin:system Access system-level checks

Plugin Development

To create a custom Doctor plugin, implement IDoctorPlugin:

public class MyCustomPlugin : IDoctorPlugin
{
    public string PluginId => "stellaops.doctor.custom";
    public string DisplayName => "Custom Checks";
    public Version Version => new(1, 0, 0);
    public DoctorCategory Category => DoctorCategory.Integration;

    public bool IsAvailable(IServiceProvider services) => true;

    public IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context)
    {
        return new IDoctorCheck[]
        {
            new MyCustomCheck()
        };
    }

    public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
        => Task.CompletedTask;
}

Implement checks using IDoctorCheck:

public class MyCustomCheck : IDoctorCheck
{
    public string CheckId => "check.custom.mycheck";
    public string Name => "My Custom Check";
    public string Description => "Validates custom configuration";
    public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
    public IReadOnlyList<string> Tags => new[] { "custom", "quick" };
    public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(2);

    public bool CanRun(DoctorPluginContext context) => true;

    public async Task<DoctorCheckResult> RunAsync(
        DoctorPluginContext context,
        CancellationToken ct)
    {
        // Perform check logic
        var isValid = await ValidateAsync(ct);

        if (isValid)
        {
            return DoctorCheckResult.Pass(
                checkId: CheckId,
                diagnosis: "Custom configuration is valid",
                evidence: new Evidence
                {
                    Description = "Validation passed",
                    Data = new Dictionary<string, string>
                    {
                        ["validated_at"] = context.TimeProvider.GetUtcNow().ToString("O")
                    }
                });
        }

        return DoctorCheckResult.Fail(
            checkId: CheckId,
            diagnosis: "Custom configuration is invalid",
            evidence: new Evidence
            {
                Description = "Validation failed",
                Data = new Dictionary<string, string>
                {
                    ["error"] = "Configuration file missing"
                }
            },
            remediation: new Remediation
            {
                Steps = new[]
                {
                    new RemediationStep
                    {
                        Order = 1,
                        Description = "Create configuration file",
                        Command = "cp /etc/stellaops/custom.yaml.sample /etc/stellaops/custom.yaml",
                        CommandType = CommandType.Shell
                    }
                }
            });
    }
}

Register the plugin in DI:

services.AddSingleton<IDoctorPlugin, MyCustomPlugin>();

Architecture

+------------------+     +------------------+     +------------------+
|       CLI        |     |        UI        |     |    External      |
|  stella doctor   |     |   /ops/doctor    |     |   Monitoring     |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         v                        v                        v
+------------------------------------------------------------------------+
|                         Doctor API Layer                                |
|  POST /api/v1/doctor/run    GET /api/v1/doctor/checks                  |
|  GET /api/v1/doctor/report  WebSocket /api/v1/doctor/stream            |
+------------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------------+
|                      Doctor Engine (Core)                               |
|  +------------------+  +------------------+  +------------------+       |
|  | Check Registry   |  | Check Executor   |  | Report Generator |       |
|  | - Discovery      |  | - Parallel exec  |  | - JSON/MD/Text   |       |
|  | - Filtering      |  | - Timeout mgmt   |  | - Remediation    |       |
|  +------------------+  +------------------+  +------------------+       |
+------------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------------+
|                        Plugin System                                    |
+--------+---------+---------+---------+---------+---------+-------------+
         |         |         |         |         |         |
         v         v         v         v         v         v
+--------+  +------+  +------+  +------+  +------+  +------+  +----------+
| Core   |  | DB & |  |Service|  | SCM  |  |Regis-|  |Observ-|  |Security |
| Plugin |  |Migra-|  | Graph |  |Plugin|  | try  |  |ability|  | Plugin  |
|        |  | tions|  |Plugin |  |      |  |Plugin|  |Plugin |  |         |
+--------+  +------+  +------+  +------+  +------+  +------+  +----------+

Troubleshooting

Doctor Engine Error (Exit Code 3)

If stella doctor returns exit code 3:

  1. Check the error message for details
  2. Verify required services are running
  3. Check connectivity to databases
  4. Review logs at /var/log/stellaops/doctor.log

Timeout Exceeded (Exit Code 5)

If checks are timing out:

# Increase per-check timeout
stella doctor --timeout 60s

# Run with reduced parallelism
stella doctor --parallel 2

Checks Not Found

If expected checks are not appearing:

  1. Verify plugin is registered in DI
  2. Check CanRun() returns true for your environment
  3. Review plugin initialization logs