# Stella Ops Doctor > Self-service diagnostics for Stella Ops deployments ## Overview The Doctor system provides comprehensive diagnostics for Stella Ops deployments, enabling operators, DevOps engineers, and developers to: - **Diagnose** what is working and what is not - **Understand** why failures occur with collected evidence - **Remediate** issues with copy/paste commands - **Verify** fixes with re-runnable checks ## Quick Start ### CLI ```bash # Quick health check stella doctor # Full diagnostic with all checks stella doctor --full # Check specific category stella doctor --category database # Export report for support stella doctor export --output diagnostic-bundle.zip # Apply safe fixes from a report (dry-run by default) stella doctor fix --from doctor-report.json --apply ``` ### UI Navigate to `/ops/doctor` in the Stella Ops console to access the interactive Doctor Dashboard. Fix actions are exposed in the UI and mirror CLI commands; destructive steps are never executed by Doctor. ### API ```bash # Run diagnostics POST /api/v1/doctor/run # Get available checks GET /api/v1/doctor/checks # Stream results WebSocket /api/v1/doctor/stream ``` ## Available Checks The Doctor system includes 60+ diagnostic checks across 9 plugins: | Plugin | Category | Checks | Description | |--------|----------|--------|-------------| | `stellaops.doctor.core` | Core | 9 | Configuration, runtime, disk, memory, time, crypto | | `stellaops.doctor.database` | Database | 8 | Connectivity, migrations, schema, connection pool | | `stellaops.doctor.servicegraph` | ServiceGraph | 6 | Gateway, routing, service health | | `stellaops.doctor.security` | Security | 9 | OIDC, LDAP, TLS, Vault | | `stellaops.doctor.attestation` | Security | 4 | Rekor connectivity, Cosign keys, clock skew, offline bundle | | `stellaops.doctor.verification` | Security | 5 | Artifact pull, signatures, SBOM, VEX, policy engine | | `stellaops.doctor.scm.*` | Integration.SCM | 8 | GitHub, GitLab connectivity/auth/permissions | | `stellaops.doctor.registry.*` | Integration.Registry | 6 | Harbor, ECR connectivity/auth/pull | | `stellaops.doctor.observability` | Observability | 4 | OTLP, logs, metrics | ### Setup Wizard Essential Checks The following checks are mandatory for the setup wizard to validate a new installation: 1. **DB connectivity + schema version** (`stellaops.doctor.database`) - `check.db.connection` - Database is reachable - `check.db.schema.version` - Schema version matches expected 2. **Attestation store availability** (`stellaops.doctor.attestation`) - `check.attestation.rekor.connectivity` - Rekor transparency log reachable - `check.attestation.cosign.keymaterial` - Signing keys available (file/KMS/keyless) - `check.attestation.clock.skew` - System clock synchronized (<5s skew) 3. **Artifact verification pipeline** (`stellaops.doctor.verification`) - `check.verification.artifact.pull` - Test artifact accessible by digest - `check.verification.signature` - DSSE signatures verifiable - `check.verification.sbom.validation` - SBOM (CycloneDX/SPDX) valid - `check.verification.vex.validation` - VEX document valid - `check.verification.policy.engine` - Policy evaluation passes ### Check ID Convention ``` check.{category}.{subcategory}.{specific} ``` Examples: - `check.config.required` - `check.database.migrations.pending` - `check.services.gateway.routing` - `check.integration.scm.github.auth` - `check.attestation.rekor.connectivity` - `check.verification.sbom.validation` ## CLI Reference See [CLI Reference](./cli-reference.md) for complete command documentation. ### Common Commands ```bash # Quick health check (tagged 'quick' checks only) stella doctor --quick # Full diagnostic with all checks stella doctor --full # Filter by category stella doctor --category database stella doctor --category security # Filter by plugin stella doctor --plugin scm.github # Run single check stella doctor --check check.database.migrations.pending # Output formats stella doctor --format json stella doctor --format markdown stella doctor --format text # Filter output by severity stella doctor --severity fail,warn # Export diagnostic bundle stella doctor export --output diagnostic.zip stella doctor export --include-logs --log-duration 4h ``` ## Exit Codes | Code | Meaning | |------|---------| | 0 | All checks passed | | 1 | One or more warnings | | 2 | One or more failures | | 3 | Doctor engine error | | 4 | Invalid arguments | | 5 | Timeout exceeded | ## Output Example ``` Stella Ops Doctor ================= Running 47 checks across 8 plugins... [PASS] check.config.required All required configuration values are present [PASS] check.database.connectivity PostgreSQL connection successful (latency: 12ms) [WARN] check.tls.certificates.expiry Diagnosis: TLS certificate expires in 14 days Evidence: Certificate: /etc/ssl/certs/stellaops.crt Subject: CN=stellaops.example.com Expires: 2026-01-26T00:00:00Z Days remaining: 14 Likely Causes: 1. Certificate renewal not scheduled 2. ACME/Let's Encrypt automation not configured Fix Steps: # 1. Check current certificate openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates # 2. Renew certificate (if using certbot) sudo certbot renew --cert-name stellaops.example.com # 3. Restart services to pick up new certificate sudo systemctl restart stellaops-gateway Verification: stella doctor --check check.tls.certificates.expiry [FAIL] check.database.migrations.pending Diagnosis: 3 pending release migrations detected in schema 'auth' Evidence: Schema: auth Current version: 099_add_dpop_thumbprints Pending migrations: - 100_add_tenant_quotas - 101_add_audit_retention - 102_add_session_revocation Likely Causes: 1. Release migrations not applied before deployment 2. Migration files added after last deployment Fix Steps: # 1. Backup database first (RECOMMENDED) pg_dump -h localhost -U stella_admin -d stellaops -F c \ -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump # 2. Apply pending release migrations stella system migrations-run --module Authority --category release # 3. Verify migrations applied stella system migrations-status --module Authority Verification: stella doctor --check check.database.migrations.pending -------------------------------------------------------------------------------- Summary: 44 passed, 2 warnings, 1 failed (47 total) Duration: 8.3s -------------------------------------------------------------------------------- ``` ## Export Bundle The Doctor export feature creates a diagnostic bundle for support escalation: ```bash stella doctor export --output diagnostic-bundle.zip ``` The bundle contains: - `doctor-report.json` - Full diagnostic report - `doctor-report.md` - Human-readable report - `environment.json` - Environment information - `system-info.json` - System details (OS, runtime, memory) - `config-sanitized.json` - Sanitized configuration (secrets redacted) - `logs/` - Recent log files (optional) - `README.md` - Bundle contents guide ### Export Options ```bash # Include logs from last 4 hours stella doctor export --include-logs --log-duration 4h # Exclude configuration stella doctor export --no-config # Custom output path stella doctor export --output /tmp/support-bundle.zip ``` ## Security ### Secret Redaction All evidence output is sanitized. Sensitive values (passwords, tokens, connection strings) are replaced with `***REDACTED***` in: - Console output - JSON exports - Diagnostic bundles - Log files ### RBAC Permissions | Scope | Description | |-------|-------------| | `doctor:run` | Execute doctor checks | | `doctor:run:full` | Execute all checks including sensitive | | `doctor:export` | Export diagnostic reports | | `admin:system` | Access system-level checks | ## Plugin Development To create a custom Doctor plugin, implement `IDoctorPlugin`: ```csharp public class MyCustomPlugin : IDoctorPlugin { public string PluginId => "stellaops.doctor.custom"; public string DisplayName => "Custom Checks"; public Version Version => new(1, 0, 0); public DoctorCategory Category => DoctorCategory.Integration; public bool IsAvailable(IServiceProvider services) => true; public IReadOnlyList GetChecks(DoctorPluginContext context) { return new IDoctorCheck[] { new MyCustomCheck() }; } public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct) => Task.CompletedTask; } ``` Implement checks using `IDoctorCheck`: ```csharp public class MyCustomCheck : IDoctorCheck { public string CheckId => "check.custom.mycheck"; public string Name => "My Custom Check"; public string Description => "Validates custom configuration"; public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail; public IReadOnlyList Tags => new[] { "custom", "quick" }; public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(2); public bool CanRun(DoctorPluginContext context) => true; public async Task RunAsync( DoctorPluginContext context, CancellationToken ct) { // Perform check logic var isValid = await ValidateAsync(ct); if (isValid) { return DoctorCheckResult.Pass( checkId: CheckId, diagnosis: "Custom configuration is valid", evidence: new Evidence { Description = "Validation passed", Data = new Dictionary { ["validated_at"] = context.TimeProvider.GetUtcNow().ToString("O") } }); } return DoctorCheckResult.Fail( checkId: CheckId, diagnosis: "Custom configuration is invalid", evidence: new Evidence { Description = "Validation failed", Data = new Dictionary { ["error"] = "Configuration file missing" } }, remediation: new Remediation { Steps = new[] { new RemediationStep { Order = 1, Description = "Create configuration file", Command = "cp /etc/stellaops/custom.yaml.sample /etc/stellaops/custom.yaml", CommandType = CommandType.Shell } } }); } } ``` Register the plugin in DI: ```csharp services.AddSingleton(); ``` ## Architecture ``` +------------------+ +------------------+ +------------------+ | CLI | | UI | | External | | stella doctor | | /ops/doctor | | Monitoring | +--------+---------+ +--------+---------+ +--------+---------+ | | | v v v +------------------------------------------------------------------------+ | Doctor API Layer | | POST /api/v1/doctor/run GET /api/v1/doctor/checks | | GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream | +------------------------------------------------------------------------+ | v +------------------------------------------------------------------------+ | Doctor Engine (Core) | | +------------------+ +------------------+ +------------------+ | | | Check Registry | | Check Executor | | Report Generator | | | | - Discovery | | - Parallel exec | | - JSON/MD/Text | | | | - Filtering | | - Timeout mgmt | | - Remediation | | | +------------------+ +------------------+ +------------------+ | +------------------------------------------------------------------------+ | v +------------------------------------------------------------------------+ | Plugin System | +--------+---------+---------+---------+---------+---------+-------------+ | | | | | | v v v v v v +--------+ +------+ +------+ +------+ +------+ +------+ +----------+ | Core | | DB & | |Service| | SCM | |Regis-| |Observ-| |Security | | Plugin | |Migra-| | Graph | |Plugin| | try | |ability| | Plugin | | | | tions| |Plugin | | | |Plugin| |Plugin | | | +--------+ +------+ +------+ +------+ +------+ +------+ +----------+ ``` ## Related Documentation - [CLI Reference](./cli-reference.md) - Complete CLI command reference - [Doctor Capabilities Specification](./doctor-capabilities.md) - Full technical specification - [Plugin Development Guide](./plugin-development.md) - Creating custom plugins ## Troubleshooting ### Doctor Engine Error (Exit Code 3) If `stella doctor` returns exit code 3: 1. Check the error message for details 2. Verify required services are running 3. Check connectivity to databases 4. Review logs at `/var/log/stellaops/doctor.log` ### Timeout Exceeded (Exit Code 5) If checks are timing out: ```bash # Increase per-check timeout stella doctor --timeout 60s # Run with reduced parallelism stella doctor --parallel 2 ``` ### Checks Not Found If expected checks are not appearing: 1. Verify plugin is registered in DI 2. Check `CanRun()` returns true for your environment 3. Review plugin initialization logs