# Stella Ops Doctor Capability Specification > **Status:** Planning / Capability Design > **Version:** 1.0.0-draft > **Last Updated:** 2026-01-12 --- ## Table of Contents 1. [Executive Summary](#1-executive-summary) 2. [Current State Analysis](#2-current-state-analysis) 3. [Doctor Architecture](#3-doctor-architecture) 4. [Plugin System Specification](#4-plugin-system-specification) 5. [CLI Surface](#5-cli-surface) 6. [UI Surface](#6-ui-surface) 7. [API Surface](#7-api-surface) 8. [Remediation Command Patterns](#8-remediation-command-patterns) 9. [Doctor Check Catalog](#9-doctor-check-catalog) 10. [Plugin Implementation Details](#10-plugin-implementation-details) --- ## 1. Executive Summary ### 1.1 Purpose The Doctor capability provides comprehensive self-service diagnostics for Stella Ops deployments. It enables operators, DevOps engineers, and developers to: - **Diagnose** what is working and what is not - **Understand** why failures occur with collected evidence - **Remediate** issues with copy/paste commands - **Verify** fixes with re-runnable checks ### 1.2 Target Users | User Type | Primary Use Case | |-----------|------------------| | **Operators** | Pre-deployment validation, incident triage, routine health checks | | **DevOps Engineers** | Integration setup, migration management, environment troubleshooting | | **Developers** | Local development environment validation, API connectivity testing | | **Support Engineers** | Remote diagnostics, evidence collection for escalation | ### 1.3 Key Principles 1. **Plugin-First Architecture** - All checks implemented via extensible plugins 2. **Actionable Remediation** - Every failure includes copy/paste fix commands 3. **Zero Docs Familiarity** - Users can diagnose and fix without reading documentation 4. **Evidence-Based Diagnostics** - All checks collect and report evidence 5. **Multi-Surface Consistency** - Same check engine powers CLI, UI, and API 6. **Non-Destructive Fixes** - Doctor never executes destructive actions; fix commands must be safe and idempotent ### 1.4 Surfaces | Surface | Entry Point | Primary Use | |---------|-------------|-------------| | **CLI** | `stella doctor` | Automation, CI/CD gates, SSH troubleshooting | | **UI** | `/ops/doctor` | Interactive diagnosis, team collaboration | | **API** | `POST /api/v1/doctor/run` | Programmatic integration, monitoring systems | --- ## 2. Current State Analysis ### 2.1 CLI - Current State **Location:** `src/Cli/StellaOps.Cli/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Entry Point | `src/Cli/StellaOps.Cli/Program.cs` | Main CLI bootstrap using System.CommandLine | | Command Factory | `src/Cli/StellaOps.Cli/Commands/CommandFactory.cs` | Registers 88+ command groups | | Config Bootstrap | `src/Cli/StellaOps.Cli/Configuration/CliBootstrapper.cs` | Environment + YAML/JSON config loading | | Exit Codes | `src/Cli/StellaOps.Cli/CliExitCodes.cs` | Standardized exit codes (0-99) | | Crypto Validator | `src/Cli/StellaOps.Cli/Services/CryptoProfileValidator.cs` | Startup validation for crypto profiles | | Migration Commands | `src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs` | `migrations-run`, `migrations-status`, `migrations-verify` | #### Existing Validation Patterns ```csharp // CryptoProfileValidator.cs - Startup validation pattern public sealed record ValidationResult { public bool IsValid { get; init; } public bool HasWarnings { get; init; } public bool HasErrors { get; init; } public List Errors { get; init; } public List Warnings { get; init; } public string ActiveProfile { get; init; } public List AvailableProviders { get; init; } } ``` #### Gaps - No unified `stella doctor` command - Output formatting is ad-hoc per command (no centralized formatter) - No remediation command generation - Validation only for crypto profiles, not comprehensive system state #### Proposed Capability ```bash # Quick system health check stella doctor # Full diagnostic with all checks stella doctor --full # Check specific category stella doctor --category database stella doctor --category integrations # Check specific plugin stella doctor --plugin scm.github # Run single check stella doctor --check check.database.migrations.pending # Output formats stella doctor --format json stella doctor --format markdown stella doctor --format text # Export report stella doctor --export report.json stella doctor --export report.md # Filter by severity stella doctor --severity fail,warn ``` --- ### 2.2 Health Infrastructure - Current State **Pattern:** Extensive health endpoints across 20+ services #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Health Status Enum | `src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthStatus.cs` | Unknown, Healthy, Degraded, Unhealthy | | Health Check Result | `src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs` | Rich result with factory methods | | Gateway Health | `src/Gateway/StellaOps.Gateway.WebService/Middleware/HealthCheckMiddleware.cs` | `/health/live`, `/health/ready`, `/health/startup` | | Scanner Health | `src/Scanner/StellaOps.Scanner.WebService/Endpoints/HealthEndpoints.cs` | `/healthz`, `/readyz` | | Orchestrator Health | `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/HealthEndpoints.cs` | `/health/details` | | Platform Health | `src/Platform/__Libraries/StellaOps.Platform.Health/PlatformHealthService.cs` | Cross-service aggregation | | Health Contract | `devops/docker/health-endpoints.md` | Formal endpoint specification | #### Health Check Result Model ```csharp // From src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs public sealed record HealthCheckResult( HealthStatus Status, string? Message, IReadOnlyDictionary? Details, DateTimeOffset CheckedAt, TimeSpan Duration) { public static HealthCheckResult Healthy(string? message = null) => ... public static HealthCheckResult Degraded(string message) => ... public static HealthCheckResult Unhealthy(string message, Exception? ex = null) => ... } ``` #### Gaps - Health endpoints check liveness/readiness, not comprehensive diagnostics - No remediation guidance in health responses - No aggregated cross-service diagnostic view - Health checks don't verify configuration validity --- ### 2.3 Doctor Service - Current State (ReleaseOrchestrator) **Location:** `src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Doctor Service | `Doctor/DoctorService.cs` | Runs `IDoctorCheck` implementations | | Doctor Report | `Doctor/DoctorReport.cs` | Aggregated results with counts | | Check Result | `Doctor/CheckResult.cs` | Individual check outcome | | IDoctorCheck | `Doctor/IDoctorCheck.cs` | Plugin interface for checks | #### IDoctorCheck Interface ```csharp // Existing interface (simplified) public interface IDoctorCheck { string Name { get; } string Category { get; } Task RunAsync(CancellationToken ct); } public sealed record CheckResult( string Name, HealthStatus Status, string? Message, TimeSpan Duration); public sealed record DoctorReport( int PassCount, int WarningCount, int FailCount, int SkippedCount, HealthStatus OverallStatus, TimeSpan TotalDuration, IReadOnlyList Results); ``` #### Gaps - Only available in ReleaseOrchestrator, not CLI or other modules - No remediation commands in output - No evidence collection - Limited to integration checks only - No plugin discovery mechanism --- ### 2.4 Integration Plugins - Current State **Location:** `src/Integrations/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Plugin Contract | `__Libraries/StellaOps.Integrations.Contracts/IIntegrationConnectorPlugin.cs` | Core plugin interface | | Integration Types | `__Libraries/StellaOps.Integrations.Contracts/IntegrationType.cs` | Registry, SCM, CI/CD, etc. | | GitHub Plugin | `__Plugins/StellaOps.Integrations.Plugin.GitHubApp/GitHubAppConnectorPlugin.cs` | GitHub App integration | | Harbor Plugin | `__Plugins/StellaOps.Integrations.Plugin.Harbor/HarborConnectorPlugin.cs` | Harbor registry | | Plugin Loader | `StellaOps.Integrations.WebService/IntegrationPluginLoader.cs` | Assembly-based discovery | | Vault Connectors | `src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/` | HashiCorp Vault, Azure Key Vault | #### IIntegrationConnectorPlugin Interface ```csharp public interface IIntegrationConnectorPlugin : IAvailabilityPlugin { IntegrationType Type { get; } IntegrationProvider Provider { get; } string Name { get; } Task TestConnectionAsync( IntegrationConfig config, CancellationToken ct); Task CheckHealthAsync( IntegrationConfig config, CancellationToken ct); } ``` #### Supported Integration Types ```csharp public enum IntegrationType { Registry = 1, // Harbor, ECR, GCR, ACR, Docker Hub, Quay, Artifactory Scm = 2, // GitHub, GitLab, Bitbucket, Gitea, Azure DevOps CiCd = 3, // GitHub Actions, GitLab CI, Jenkins, CircleCI RepoSource = 4, // npm, PyPI, Maven, NuGet, Crates.io RuntimeHost = 5, // eBPF, ETW, dyld agents FeedMirror = 6 // NVD, OSV, StellaOps mirrors } ``` #### Gaps - `TestConnectionAsync` exists but not surfaced via CLI doctor - No standardized remediation output - Health checks don't report required permissions/scopes - No validation of webhook/event delivery configuration --- ### 2.5 Authority Plugins - Current State **Location:** `src/Authority/StellaOps.Authority/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Plugin Abstractions | `StellaOps.Authority.Plugins.Abstractions/` | Plugin registration interface | | LDAP Plugin | `StellaOps.Authority.Plugin.Ldap/` | LDAP/AD integration | | OIDC Plugin | `StellaOps.Authority.Plugin.Oidc/` | OpenID Connect | | SAML Plugin | `StellaOps.Authority.Plugin.Saml/` | SAML 2.0 | | Plugin Registry | `StellaOps.Authority/AuthorityPluginRegistry.cs` | Manages named plugins | | LDAP Config | `etc/authority.plugins/ldap.yaml` | Sample configuration | #### LDAP Plugin Capabilities ```yaml # From etc/authority.plugins/ldap.yaml connection: host: "ldaps://ldap.example.internal" port: 636 searchBase: "ou=people,dc=example,dc=internal" bindDn: "cn=bind-user,ou=service,dc=example,dc=internal" bindPasswordSecret: "file:/etc/secrets/ldap-bind.txt" security: requireTls: true claims: groupAttribute: "memberOf" cache: enabled: true ttlSeconds: 600 ``` #### Gaps - No CLI command to validate LDAP configuration - Health checks exist but don't provide remediation - No validation of group mapping correctness - TLS certificate validation not exposed as diagnostic --- ### 2.6 Database & Migrations - Current State **Location:** `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Migration Runner | `Migrations/MigrationRunner.cs` | Executes SQL migrations with advisory locks | | Migration Category | `Migrations/MigrationCategory.cs` | Startup, Release, Seed, Data | | Status Service | `Migrations/MigrationStatusService.cs` | Query migration state | | CLI Commands | `src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs` | `migrations-run/status/verify` | | Strategy Docs | `docs/db/MIGRATION_STRATEGY.md` | Migration process documentation | #### Migration Categories | Prefix | Category | Automatic | Breaking | |--------|----------|-----------|----------| | `001-099` | Startup | Yes | No | | `100-199` | Release | No (CLI) | Yes | | `S001-S999` | Seed | Yes | No | | `DM001-DM999` | Data | Background | Varies | #### Schema Tracking ```sql CREATE TABLE {schema}.schema_migrations ( migration_name TEXT PRIMARY KEY, category TEXT NOT NULL DEFAULT 'startup', checksum TEXT NOT NULL, applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), applied_by TEXT, duration_ms INT ); ``` #### Gaps - Migration status not integrated with doctor - No checksum mismatch diagnostics with remediation - Lock contention not diagnosed - No cross-schema migration state view --- ### 2.7 UI - Current State **Location:** `src/Web/StellaOps.Web/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Routes | `src/app/app.routes.ts` | Angular Router configuration | | Platform Health | `src/app/features/platform-health/` | Health dashboard at `/ops/health` | | Health Client | `src/app/core/api/platform-health.client.ts` | API client for health endpoints | | Console Status | `src/app/features/console/console-status.component.ts` | Queue/run status | #### Platform Health Dashboard Features - Real-time KPI strip (services, latency, error rate, incidents) - Service health grid with grouping (healthy/degraded/unhealthy) - Dependency graph visualization - Incident timeline (last 24h) - Auto-refresh every 10 seconds #### Gaps - No diagnostic check execution from UI - No remediation command display - No evidence collection/export - Health dashboard shows status, not actionable diagnostics --- ### 2.8 Service Connectivity - Current State **Location:** `src/Gateway/`, `src/Router/` #### What Exists Today | Component | File Path | Description | |-----------|-----------|-------------| | Gateway Routing | `src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs` | HTTP to microservice routing | | Connection Manager | `src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs` | HELLO handshake, heartbeats | | Routing State | `src/Router/__Libraries/StellaOps.Router.Common/Abstractions/IGlobalRoutingState.cs` | Live service connections | | Claims Propagation | `src/Gateway/StellaOps.Gateway.WebService/Middleware/ClaimsPropagationMiddleware.cs` | OAuth claims forwarding | #### Service Registration Flow 1. Service connects to Gateway via Router transport (TCP/TLS/Valkey) 2. HELLO handshake with endpoint/schema declarations 3. Periodic heartbeats with health/latency metrics 4. Gateway maintains `ConnectionState` for routing decisions #### Gaps - No CLI command to verify service graph health - Routing failures not diagnosed with remediation - No validation of claims propagation configuration - Transport connectivity not exposed as diagnostic --- ## 3. Doctor Architecture ### 3.1 High-Level Architecture ``` +------------------+ +------------------+ +------------------+ | CLI | | UI | | External | | stella doctor | | /ops/doctor | | Monitoring | +--------+---------+ +--------+---------+ +--------+---------+ | | | v v v +------------------------------------------------------------------------+ | Doctor API Layer | | POST /api/v1/doctor/run GET /api/v1/doctor/checks | | GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream | +------------------------------------------------------------------------+ | v +------------------------------------------------------------------------+ | Doctor Engine (Core) | | +------------------+ +------------------+ +------------------+ | | | Check Registry | | Check Executor | | Report Generator | | | | - Discovery | | - Parallel exec | | - JSON/MD/Text | | | | - Filtering | | - Timeout mgmt | | - Remediation | | | +------------------+ +------------------+ +------------------+ | +------------------------------------------------------------------------+ | v +------------------------------------------------------------------------+ | Plugin System | +--------+---------+---------+---------+---------+---------+-------------+ | | | | | | v v v v v v +--------+ +------+ +------+ +------+ +------+ +------+ +----------+ | Core | | DB & | |Service| | SCM | |Regis-| | Vault| | Authority| | Plugin | |Migra-| | Graph | |Plugin| | try | |Plugin| | Plugin | | | | tions| |Plugin | | | |Plugin| | | | | +--------+ +------+ +------+ +------+ +------+ +------+ +----------+ ``` ### 3.2 Core Components #### Doctor Engine **Proposed Location:** `src/__Libraries/StellaOps.Doctor/` ``` StellaOps.Doctor/ ├── Engine/ │ ├── DoctorEngine.cs # Main orchestrator │ ├── CheckExecutor.cs # Parallel check execution │ └── CheckRegistry.cs # Plugin discovery & filtering ├── Models/ │ ├── DoctorCheckResult.cs # Extended check result with evidence │ ├── DoctorReport.cs # Full report model │ ├── Remediation.cs # Fix command model │ └── Evidence.cs # Collected evidence model ├── Plugins/ │ ├── IDoctorPlugin.cs # Plugin interface │ ├── IDoctorCheck.cs # Check interface (extended) │ └── DoctorPluginContext.cs # Plugin execution context ├── Output/ │ ├── JsonReportFormatter.cs # JSON output │ ├── MarkdownReportFormatter.cs # Markdown output │ └── TextReportFormatter.cs # Console text output └── DoctorServiceExtensions.cs # DI registration ``` #### Check Execution Model ```csharp public sealed class CheckExecutor { private readonly IEnumerable _plugins; private readonly TimeProvider _timeProvider; private readonly ILogger _logger; public async Task RunAsync( DoctorRunOptions options, CancellationToken ct) { var checks = GetFilteredChecks(options); var results = new ConcurrentBag(); // Parallel execution with configurable concurrency await Parallel.ForEachAsync( checks, new ParallelOptions { MaxDegreeOfParallelism = options.Parallelism, CancellationToken = ct }, async (check, token) => { var result = await ExecuteCheckAsync(check, options, token); results.Add(result); }); return GenerateReport(results, options); } } ``` ### 3.3 Result Model ```csharp public sealed record DoctorCheckResult { // Identity public required string CheckId { get; init; } public required string PluginId { get; init; } public required string Category { get; init; } // Outcome public required DoctorSeverity Severity { get; init; } // Pass, Warn, Fail, Skip public required string Diagnosis { get; init; } // Evidence public required Evidence Evidence { get; init; } // Remediation public IReadOnlyList? LikelyCauses { get; init; } public Remediation? Remediation { get; init; } public string? VerificationCommand { get; init; } // Metadata public required TimeSpan Duration { get; init; } public required DateTimeOffset ExecutedAt { get; init; } } public enum DoctorSeverity { Pass = 0, Info = 1, Warn = 2, Fail = 3, Skip = 4 } public sealed record Evidence { public required string Description { get; init; } public required IReadOnlyDictionary Data { get; init; } public IReadOnlyList? SensitiveKeys { get; init; } // Keys to redact in output } public sealed record Remediation { public required IReadOnlyList Steps { get; init; } public string? SafetyNote { get; init; } public bool RequiresBackup { get; init; } } public sealed record RemediationStep { public required int Order { get; init; } public required string Description { get; init; } public required string Command { get; init; } public CommandType CommandType { get; init; } // Shell, SQL, API, FileEdit public IReadOnlyDictionary? Placeholders { get; init; } } public enum CommandType { Shell, // Bash/PowerShell command SQL, // SQL statement API, // API call (curl/stella CLI) FileEdit, // File modification Manual // Manual step (no command) } ``` --- ## 4. Plugin System Specification ### 4.1 Plugin Interface ```csharp /// /// Base interface for Doctor plugins. /// Plugins group related checks and share configuration context. /// public interface IDoctorPlugin { /// Unique plugin identifier (e.g., "stellaops.doctor.database") string PluginId { get; } /// Human-readable name string DisplayName { get; } /// Plugin category for filtering DoctorCategory Category { get; } /// Plugin version for compatibility Version Version { get; } /// Minimum Doctor engine version required Version MinEngineVersion { get; } /// Check if plugin is available in current environment bool IsAvailable(IServiceProvider services); /// Get all checks provided by this plugin IReadOnlyList GetChecks(DoctorPluginContext context); /// Initialize plugin with configuration Task InitializeAsync(DoctorPluginContext context, CancellationToken ct); } public enum DoctorCategory { Core, // Platform, config, runtime Database, // Schema, migrations, connectivity ServiceGraph, // Inter-service communication Integration, // External system integrations Security, // Auth, TLS, secrets Observability // Logs, metrics, traces } ``` ### 4.2 Check Interface ```csharp /// /// Individual diagnostic check. /// public interface IDoctorCheck { /// Unique check identifier (e.g., "check.database.migrations.pending") string CheckId { get; } /// Human-readable name string Name { get; } /// What this check verifies string Description { get; } /// Default severity if check fails DoctorSeverity DefaultSeverity { get; } /// Tags for filtering (e.g., ["quick", "security", "migration"]) IReadOnlyList Tags { get; } /// Estimated execution time TimeSpan EstimatedDuration { get; } /// Check if this check can run in current context bool CanRun(DoctorPluginContext context); /// Execute the check Task RunAsync(DoctorPluginContext context, CancellationToken ct); } ``` ### 4.3 Plugin Context ```csharp public sealed class DoctorPluginContext { public required IServiceProvider Services { get; init; } public required IConfiguration Configuration { get; init; } public required TimeProvider TimeProvider { get; init; } public required ILogger Logger { get; init; } // Runtime info public required string EnvironmentName { get; init; } // Development, Staging, Production public required string? TenantId { get; init; } // Plugin configuration public required JsonElement PluginConfig { get; init; } // Evidence helpers public EvidenceBuilder CreateEvidence() => new(); public RemediationBuilder CreateRemediation() => new(); // Secret redaction public string Redact(string value) => "***REDACTED***"; public string RedactConnectionString(string cs) => /* redact password */; } ``` ### 4.4 Plugin Discovery #### Static Discovery (Build-time) Plugins register via DI at startup: ```csharp // In Program.cs or startup services.AddDoctorPlugin(); services.AddDoctorPlugin(); services.AddDoctorPlugin(); services.AddDoctorPlugin(); // ... ``` #### Dynamic Discovery (Runtime) Plugins can be loaded from assemblies: ```csharp // In DoctorPluginLoader.cs public class DoctorPluginLoader { public IEnumerable LoadFromDirectory(string path) { foreach (var dll in Directory.GetFiles(path, "StellaOps.Doctor.Plugin.*.dll")) { var assembly = Assembly.LoadFrom(dll); foreach (var type in assembly.GetTypes() .Where(t => typeof(IDoctorPlugin).IsAssignableFrom(t) && !t.IsAbstract)) { yield return (IDoctorPlugin)Activator.CreateInstance(type)!; } } } } ``` ### 4.5 Declarative Doctor Packs (YAML) Doctor packs provide declarative checks that wrap CLI commands and parsing rules. They complement compiled plugins and are loaded from `plugins/doctor/*.yaml` (plus optional override directories). Short example: ```yaml apiVersion: stella.ops/doctor.v1 kind: DoctorPlugin metadata: name: doctor-release-orchestrator-gitlab spec: discovery: when: - env: GITLAB_URL ``` Full sample: `docs/benchmarks/doctor/doctor-plugin-release-orchestrator-gitlab.yaml` Key fields: - `spec.discovery.when`: env/file existence gates. - `checks[].run.exec`: command to execute (must be deterministic). - `checks[].parse.expect` or `checks[].parse.expectJson`: pass/fail rules. - `checks[].how_to_fix.commands[]`: exact fix commands printed verbatim. ### 4.6 Plugin Directory Structure ``` src/ ├── __Libraries/ │ └── StellaOps.Doctor/ # Core doctor engine │ └── Plugins/ │ └── Core/ # Built-in core plugin ├── Doctor/ │ └── __Plugins/ │ ├── StellaOps.Doctor.Plugin.Database/ │ ├── StellaOps.Doctor.Plugin.ServiceGraph/ │ ├── StellaOps.Doctor.Plugin.Scm.GitHub/ │ ├── StellaOps.Doctor.Plugin.Scm.GitLab/ │ ├── StellaOps.Doctor.Plugin.Registry.Harbor/ │ ├── StellaOps.Doctor.Plugin.Registry.ECR/ │ ├── StellaOps.Doctor.Plugin.Vault/ │ ├── StellaOps.Doctor.Plugin.Authority/ │ └── StellaOps.Doctor.Plugin.Observability/ ``` ### 4.7 Plugin Configuration Plugins read configuration from the standard config hierarchy: ```yaml # In stellaops.yaml or environment-specific config Doctor: Enabled: true DefaultTimeout: 30s Parallelism: 4 Plugins: Database: Enabled: true ConnectionTimeout: 10s ServiceGraph: Enabled: true HealthEndpointTimeout: 5s Scm: GitHub: Enabled: true RateLimitThreshold: 100 Registry: Harbor: Enabled: true SkipTlsVerify: false Vault: Enabled: true SecretsToValidate: - "secret/data/stellaops/api-keys" - "secret/data/stellaops/certificates" ``` ### 4.8 Security Model #### Secret Redaction All evidence output is sanitized: ```csharp public sealed class EvidenceBuilder { private readonly Dictionary _data = new(); private readonly List _sensitiveKeys = new(); public EvidenceBuilder Add(string key, string value) { _data[key] = value; return this; } public EvidenceBuilder AddSensitive(string key, string value) { _data[key] = value; _sensitiveKeys.Add(key); return this; } public EvidenceBuilder AddConnectionString(string key, string connectionString) { // Parse and redact password var redacted = RedactConnectionStringPassword(connectionString); _data[key] = redacted; return this; } } ``` #### RBAC Permissions Doctor checks require specific scopes: | Scope | Description | |-------|-------------| | `doctor:run` | Execute doctor checks | | `doctor:run:full` | Execute all checks including sensitive | | `doctor:export` | Export diagnostic reports | | `admin:system` | Access system-level checks | ### 4.9 Versioning Strategy - **Engine version:** Semantic versioning (e.g., `1.0.0`) - **Plugin version:** Independent semantic versioning - **Compatibility:** Plugins declare `MinEngineVersion` - **Check IDs:** Stable across versions (never renamed) ```csharp // Version compatibility check if (plugin.MinEngineVersion > DoctorEngine.Version) { _logger.LogWarning( "Plugin {PluginId} requires engine {Required}, current is {Current}. Skipping.", plugin.PluginId, plugin.MinEngineVersion, DoctorEngine.Version); continue; } ``` --- ## 5. CLI Surface ### 5.1 Command Structure **Proposed Location:** `src/Cli/StellaOps.Cli/Commands/DoctorCommandGroup.cs` ```bash stella doctor run [options] stella doctor list [options] stella doctor fix --from report.json [--apply] ``` Note: `stella doctor` remains shorthand for `stella doctor run` for compatibility. `stella doctor fix` executes only non-destructive commands. Any destructive step must be presented as manual guidance and is not eligible for `--apply`. ### 5.2 Options and Flags | Option | Short | Type | Default | Description | |--------|-------|------|---------|-------------| | `--format` | `-f` | enum | `text` | Output format: `text`, `table`, `json`, `markdown` | | `--quick` | `-q` | flag | false | Run only quick checks (tagged `quick`) | | `--full` | | flag | false | Run all checks including slow/intensive | | `--pack` | | string[] | all | Filter by pack name (manifest grouping) | | `--category` | `-c` | string[] | all | Filter by category: `core`, `database`, `service-graph`, `integration`, `security`, `observability` | | `--plugin` | `-p` | string[] | all | Filter by plugin ID (e.g., `scm.github`) | | `--check` | | string | | Run single check by ID | | `--severity` | `-s` | enum[] | all | Filter output by severity: `pass`, `info`, `warn`, `fail` | | `--export` | `-e` | path | | Export report to file | | `--timeout` | `-t` | duration | 30s | Per-check timeout | | `--parallel` | | int | 4 | Max parallel check execution | | `--no-remediation` | | flag | false | Skip remediation command generation | | `--verbose` | `-v` | flag | false | Include detailed evidence in output | | `--tenant` | | string | | Tenant context for multi-tenant checks | #### Fix Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `--from` | path | required | Path to JSON report with `how_to_fix` commands | | `--apply` | flag | false | Execute fixes (default is dry-run preview) | Only commands marked safe and non-destructive are eligible for `--apply`. Destructive changes must be printed as manual steps and executed by the operator outside Doctor. ### 5.3 Exit Codes | Code | Meaning | |------|---------| | 0 | All checks passed | | 1 | One or more warnings | | 2 | One or more failures | | 3 | Doctor engine error | | 4 | Invalid arguments | | 5 | Timeout exceeded | ### 5.4 Usage Examples ```bash # Quick health check (alias) stella doctor # Run all checks explicitly stella doctor run # Full diagnostic stella doctor --full # Check only database category stella doctor --category database # Check specific integration stella doctor --plugin scm.github # Run single check stella doctor --check check.database.migrations.pending # JSON output for CI/CD stella doctor --format json --severity fail,warn # Run orchestrator pack with table output stella doctor run --pack orchestrator --format table # Apply fixes from a JSON report (dry-run unless --apply) stella doctor fix --from out.json --apply # Export markdown report stella doctor --full --format markdown --export doctor-report.md # Verbose with all evidence stella doctor --verbose --full # Quick check with 60s timeout stella doctor --quick --timeout 60s ``` ### 5.5 Text Output Format ``` Stella Ops Doctor ================= Running 47 checks across 8 plugins... [PASS] check.config.required All required configuration values are present [PASS] check.database.connectivity PostgreSQL connection successful (latency: 12ms) [WARN] check.tls.certificates.expiry Diagnosis: TLS certificate expires in 14 days Evidence: Certificate: /etc/ssl/certs/stellaops.crt Subject: CN=stellaops.example.com Expires: 2026-01-26T00:00:00Z Days remaining: 14 Likely Causes: 1. Certificate renewal not scheduled 2. ACME/Let's Encrypt automation not configured Fix Steps: # 1. Check current certificate openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates # 2. Renew certificate (if using certbot) sudo certbot renew --cert-name stellaops.example.com # 3. Restart services to pick up new certificate sudo systemctl restart stellaops-gateway Verification: stella doctor --check check.tls.certificates.expiry [FAIL] check.database.migrations.pending Diagnosis: 3 pending release migrations detected in schema 'auth' Evidence: Schema: auth Current version: 099_add_dpop_thumbprints Pending migrations: - 100_add_tenant_quotas - 101_add_audit_retention - 102_add_session_revocation Connection: postgres://localhost:5432/stellaops (user: stella_app) Likely Causes: 1. Release migrations not applied before deployment 2. Migration files added after last deployment Fix Steps: # 1. Backup database first (RECOMMENDED) pg_dump -h localhost -U stella_admin -d stellaops -F c \ -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump # 2. Apply pending release migrations stella system migrations-run --module Authority --category release # 3. Verify migrations applied stella system migrations-status --module Authority Verification: stella doctor --check check.database.migrations.pending ──────────────────────────────────────────────────────────────── Summary: 44 passed, 2 warnings, 1 failed (47 total) Duration: 8.3s ──────────────────────────────────────────────────────────────── ``` --- ## 6. UI Surface ### 6.1 Route and Location **Route:** `/ops/doctor` **Location:** `src/Web/StellaOps.Web/src/app/features/doctor/` ### 6.2 Component Structure ``` src/app/features/doctor/ ├── doctor.routes.ts ├── doctor-dashboard.component.ts # Main page ├── doctor-dashboard.component.html ├── doctor-dashboard.component.scss ├── components/ │ ├── check-list/ │ │ ├── check-list.component.ts # Filterable check list │ │ └── check-list.component.html │ ├── check-result/ │ │ ├── check-result.component.ts # Single check display │ │ └── check-result.component.html │ ├── remediation-panel/ │ │ ├── remediation-panel.component.ts # Fix commands display │ │ └── remediation-panel.component.html │ ├── evidence-viewer/ │ │ ├── evidence-viewer.component.ts # Collected evidence │ │ └── evidence-viewer.component.html │ └── export-dialog/ │ ├── export-dialog.component.ts # Export options │ └── export-dialog.component.html └── services/ ├── doctor.client.ts # API client ├── doctor.service.ts # Business logic └── doctor.store.ts # Signal-based state ``` ### 6.3 Dashboard Layout ``` +------------------------------------------------------------------+ | Doctor Diagnostics [Run Quick] [Run Full] | +------------------------------------------------------------------+ | Filters: [Category v] [Plugin v] [Severity v] [Export Report] | +------------------------------------------------------------------+ | | | Summary Strip | | +----------+ +----------+ +----------+ +----------+ +----------+ | | | 44 | | 2 | | 1 | | 0 | | 8.3s | | | | Passed | | Warnings | | Failed | | Skipped | | Duration | | | +----------+ +----------+ +----------+ +----------+ +----------+ | | | +------------------------------------------------------------------+ | Check Results | | +----------------------------------------------------------------+ | | | [FAIL] check.database.migrations.pending [Expand] | | | | 3 pending release migrations in schema 'auth' | | | +----------------------------------------------------------------+ | | | [WARN] check.tls.certificates.expiry [Expand] | | | | TLS certificate expires in 14 days | | | +----------------------------------------------------------------+ | | | [PASS] check.database.connectivity [Expand] | | | | PostgreSQL connection successful (12ms) | | | +----------------------------------------------------------------+ | | | ... more checks ... | | +------------------------------------------------------------------+ ``` ### 6.4 Expanded Check View ``` +------------------------------------------------------------------+ | [FAIL] check.database.migrations.pending | +------------------------------------------------------------------+ | Diagnosis | | 3 pending release migrations detected in schema 'auth' | +------------------------------------------------------------------+ | Evidence | | +--------------------------------------------------------------+ | | | Schema | auth | | | | Current version | 099_add_dpop_thumbprints | | | | Pending | 100_add_tenant_quotas | | | | | 101_add_audit_retention | | | | | 102_add_session_revocation | | | | Connection | postgres://localhost:5432/stellaops | | | +--------------------------------------------------------------+ | +------------------------------------------------------------------+ | Likely Causes | | 1. Release migrations not applied before deployment | | 2. Migration files added after last deployment | +------------------------------------------------------------------+ | Fix Steps [Copy All] | | +--------------------------------------------------------------+ | | | Step 1: Backup database first (RECOMMENDED) [Copy] | | | | pg_dump -h localhost -U stella_admin -d stellaops -F c \ | | | | -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump | | | +--------------------------------------------------------------+ | | | Step 2: Apply pending release migrations [Copy] | | | | stella system migrations-run --module Authority \ | | | | --category release | | | +--------------------------------------------------------------+ | | | Step 3: Verify migrations applied [Copy] | | | | stella system migrations-status --module Authority | | | +--------------------------------------------------------------+ | +------------------------------------------------------------------+ | Verification [Copy] | | stella doctor --check check.database.migrations.pending | +------------------------------------------------------------------+ | [Re-run Check] [Mark Resolved] | +------------------------------------------------------------------+ ``` ### 6.5 Pack Navigation and Fix Actions - Navigation hierarchy: packs -> plugins -> checks. - Each check shows status, evidence, Copy Fix Commands, and Run Fix (disabled unless `doctor.fix.enabled=true`). - Export actions: Download JSON and Download DSSE summary. ### 6.6 Real-Time Updates - **Polling:** Auto-refresh option (every 30s/60s/5m) - **SSE:** Live check progress during execution - **WebSocket:** Optional for high-frequency updates --- ## 7. API Surface ### 7.1 Endpoints **Base Path:** `/api/v1/doctor` | Method | Path | Description | |--------|------|-------------| | `GET` | `/checks` | List available checks with metadata | | `GET` | `/plugins` | List available plugins | | `POST` | `/run` | Execute doctor checks | | `GET` | `/run/{runId}` | Get run status/results | | `GET` | `/run/{runId}/stream` | SSE stream for live progress | | `GET` | `/reports` | List historical reports | | `GET` | `/reports/{reportId}` | Get specific report | | `DELETE` | `/reports/{reportId}` | Delete report | ### 7.2 Request/Response Models #### List Checks ```http GET /api/v1/doctor/checks?category=database&tags=quick ``` ```json { "checks": [ { "checkId": "check.database.connectivity", "name": "Database Connectivity", "description": "Verify PostgreSQL connection", "pluginId": "stellaops.doctor.database", "category": "database", "defaultSeverity": "fail", "tags": ["quick", "database"], "estimatedDurationMs": 500 } ], "total": 47 } ``` #### Run Checks ```http POST /api/v1/doctor/run Content-Type: application/json { "mode": "quick", "categories": ["database", "integration"], "plugins": [], "checkIds": [], "timeoutMs": 30000, "parallelism": 4, "includeRemediation": true } ``` ```json { "runId": "dr_20260112_143052_abc123", "status": "running", "startedAt": "2026-01-12T14:30:52Z", "checksTotal": 12, "checksCompleted": 0 } ``` #### Get Run Results ```http GET /api/v1/doctor/run/dr_20260112_143052_abc123 ``` ```json { "runId": "dr_20260112_143052_abc123", "status": "completed", "startedAt": "2026-01-12T14:30:52Z", "completedAt": "2026-01-12T14:31:00Z", "durationMs": 8300, "summary": { "passed": 44, "warnings": 2, "failed": 1, "skipped": 0, "total": 47 }, "overallSeverity": "fail", "results": [ { "checkId": "check.database.migrations.pending", "pluginId": "stellaops.doctor.database", "category": "database", "severity": "fail", "diagnosis": "3 pending release migrations detected in schema 'auth'", "evidence": { "description": "Migration state for auth schema", "data": { "schema": "auth", "currentVersion": "099_add_dpop_thumbprints", "pendingMigrations": "100_add_tenant_quotas, 101_add_audit_retention, 102_add_session_revocation", "connection": "postgres://localhost:5432/stellaops" } }, "likelyCauses": [ "Release migrations not applied before deployment", "Migration files added after last deployment" ], "remediation": { "requiresBackup": true, "safetyNote": "Always backup before running migrations", "steps": [ { "order": 1, "description": "Backup database first (RECOMMENDED)", "command": "pg_dump -h localhost -U stella_admin -d stellaops -F c -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump", "commandType": "shell", "placeholders": {} }, { "order": 2, "description": "Apply pending release migrations", "command": "stella system migrations-run --module Authority --category release", "commandType": "shell", "placeholders": {} }, { "order": 3, "description": "Verify migrations applied", "command": "stella system migrations-status --module Authority", "commandType": "shell", "placeholders": {} } ] }, "verificationCommand": "stella doctor --check check.database.migrations.pending", "durationMs": 234, "executedAt": "2026-01-12T14:30:54Z" } ] } ``` Results also expose a `how_to_fix` object for automation. It is a simplified alias of the richer `remediation` model and includes `commands[]` printed verbatim. ### 7.3 SSE Stream ```http GET /api/v1/doctor/run/dr_20260112_143052_abc123/stream Accept: text/event-stream ``` ``` event: check-started data: {"checkId":"check.database.connectivity","startedAt":"2026-01-12T14:30:52Z"} event: check-completed data: {"checkId":"check.database.connectivity","severity":"pass","durationMs":45} event: check-started data: {"checkId":"check.database.migrations.pending","startedAt":"2026-01-12T14:30:52Z"} event: check-completed data: {"checkId":"check.database.migrations.pending","severity":"fail","durationMs":234} event: run-completed data: {"runId":"dr_20260112_143052_abc123","summary":{"passed":44,"warnings":2,"failed":1}} ``` ### 7.4 Evidence Logs and Attestations Doctor runs emit a JSONL evidence log and optional DSSE summary for audit trails. By default, JSONL is local only and deterministic; outbound telemetry is opt-in. - JSONL path: `artifacts/doctor/doctor-run-.ndjson` (configurable). - DSSE summary: `artifacts/doctor/doctor-run-.dsse.json` (optional). - Evidence records include `doctor_command` to capture the operator-invoked command. DSSE summaries assume operator execution and must include the same command note. Example JSONL line: ```json {"runId":"dr_20260112_143052_abc123","doctor_command":"stella doctor run --format json","checkId":"check.database.connectivity","severity":"pass","executedAt":"2026-01-12T14:30:52Z","how_to_fix":{"commands":[]}} ``` --- ## 8. Remediation Command Patterns Remediation should favor the best operator experience: short, copy/paste friendly commands with minimal steps and clear verification guidance. ### 8.1 Standard Output Format Every failed check produces remediation in this structure: ``` [{SEVERITY}] {check.id} Diagnosis: {one-line summary} Evidence: {key}: {value} {key}: {value} ... Likely Causes: 1. {most likely cause} 2. {second most likely cause} ... Fix Steps: # {step number}. {description} {command} # {step number}. {description} {command} ... Verification: {command to re-run this specific check} ``` ### 8.1.1 JSON Remediation Structure The JSON output MUST include a `how_to_fix` object for agent consumption. It can be derived from `remediation.steps` and preserves command order. ```json "how_to_fix": { "summary": "Apply baseline branch policy", "commands": [ "stella orchestrator scm apply-branch-policy --preset strict" ] } ``` ### 8.2 Placeholder Conventions When commands require user-specific values: | Placeholder | Meaning | Example | |-------------|---------|---------| | `{HOSTNAME}` | Target hostname | `ldap.example.com` | | `{PORT}` | Port number | `636` | | `{USERNAME}` | Username | `admin` | | `{PASSWORD}` | Password (never shown) | `***` | | `{DATABASE}` | Database name | `stellaops` | | `{SCHEMA}` | Schema name | `auth` | | `{FILE_PATH}` | File path | `/etc/ssl/certs/ca.crt` | | `{TOKEN}` | API token (never shown) | `***` | | `{URL}` | Full URL | `https://api.github.com` | ### 8.3 Safety Notes Doctor fix executes only non-destructive commands. If a fix requires a change that modifies data, Doctor must present it as manual guidance with explicit safety notes and never execute it. ``` Manual Steps (not executed by Doctor): # SAFETY: This operation modifies the database. Create a backup first. # 1. Backup database (REQUIRED before proceeding) pg_dump -h {HOSTNAME} -U {USERNAME} -d {DATABASE} -F c \ -f backup_$(date +%Y%m%d_%H%M%S).dump # 2. Apply the fix stella system migrations-run --module Authority --category release ``` ### 8.4 Multi-Platform Commands Where applicable, provide commands for different platforms: ``` Fix Steps: # 1. Restart the service # Linux (systemd): sudo systemctl restart stellaops-gateway # Linux (Docker): docker restart stellaops-gateway # Docker Compose: docker compose restart gateway # Kubernetes: kubectl rollout restart deployment/stellaops-gateway -n stellaops ``` --- ## 9. Doctor Check Catalog This section documents all diagnostic checks organized by plugin/category. ### 9.1 Core Platform Plugin (`stellaops.doctor.core`) #### check.config.required | Property | Value | |----------|-------| | **CheckId** | `check.config.required` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Fail | | **Tags** | `quick`, `config`, `startup` | | **What it verifies** | All required configuration values are present | | **Evidence collected** | Missing keys, config sources checked, environment | | **Failure modes** | Missing `STELLAOPS_BACKEND_URL`, missing database connection string, missing Authority URL | **Remediation:** ```bash # 1. Check which configuration values are missing stella config list --show-missing # 2. Set missing environment variables export STELLAOPS_BACKEND_URL="https://api.stellaops.example.com" export STELLAOPS_POSTGRES_CONNECTION="Host=localhost;Database=stellaops;Username=stella_app;Password={PASSWORD}" export STELLAOPS_AUTHORITY_URL="https://auth.stellaops.example.com" # 3. Or update configuration file # Edit: /etc/stellaops/stellaops.yaml ``` **Verification:** `stella doctor --check check.config.required` --- #### check.config.syntax | Property | Value | |----------|-------| | **CheckId** | `check.config.syntax` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Fail | | **Tags** | `quick`, `config` | | **What it verifies** | Configuration files have valid YAML/JSON syntax | | **Evidence collected** | File path, line number, parse error message | | **Failure modes** | Invalid YAML indentation, JSON syntax error, encoding issues | **Remediation:** ```bash # 1. Validate YAML syntax yamllint /etc/stellaops/stellaops.yaml # 2. Check for encoding issues (should be UTF-8) file /etc/stellaops/stellaops.yaml # 3. Fix common YAML issues # - Use spaces, not tabs # - Check string quoting # - Verify indentation (2 spaces per level) ``` **Verification:** `stella doctor --check check.config.syntax` --- #### check.config.deprecated | Property | Value | |----------|-------| | **CheckId** | `check.config.deprecated` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Warn | | **Tags** | `config` | | **What it verifies** | No deprecated configuration keys are in use | | **Evidence collected** | Deprecated keys found, replacement keys | | **Failure modes** | Using old key names, removed options | **Remediation:** ```bash # 1. Review deprecated keys and their replacements stella config migrate --dry-run # 2. Update configuration file with new key names stella config migrate --apply # 3. Verify configuration after migration stella config validate ``` **Verification:** `stella doctor --check check.config.deprecated` --- #### check.runtime.dotnet | Property | Value | |----------|-------| | **CheckId** | `check.runtime.dotnet` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Fail | | **Tags** | `quick`, `runtime` | | **What it verifies** | .NET runtime version meets minimum requirements | | **Evidence collected** | Installed version, required version, runtime path | | **Failure modes** | Outdated .NET version, missing runtime | **Remediation:** ```bash # 1. Check current .NET version dotnet --version # 2. Install required .NET version (Ubuntu/Debian) wget https://dot.net/v1/dotnet-install.sh chmod +x dotnet-install.sh ./dotnet-install.sh --channel 10.0 # 3. Verify installation dotnet --list-runtimes ``` **Verification:** `stella doctor --check check.runtime.dotnet` --- #### check.runtime.memory | Property | Value | |----------|-------| | **CheckId** | `check.runtime.memory` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Warn | | **Tags** | `runtime`, `resources` | | **What it verifies** | Sufficient memory available for operation | | **Evidence collected** | Total memory, available memory, GC memory info | | **Failure modes** | Low available memory (<1GB), high GC pressure | **Remediation:** ```bash # 1. Check current memory usage free -h # 2. Identify memory-heavy processes ps aux --sort=-%mem | head -20 # 3. Adjust container memory limits if applicable # Docker: docker update --memory 4g stellaops-gateway # Kubernetes: kubectl patch deployment stellaops-gateway -p '{"spec":{"template":{"spec":{"containers":[{"name":"gateway","resources":{"limits":{"memory":"4Gi"}}}]}}}}' ``` **Verification:** `stella doctor --check check.runtime.memory` --- #### check.runtime.disk.space | Property | Value | |----------|-------| | **CheckId** | `check.runtime.disk.space` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Warn | | **Tags** | `runtime`, `resources` | | **What it verifies** | Sufficient disk space on required paths | | **Evidence collected** | Path, total space, available space, usage percentage | | **Failure modes** | Data directory >90% full, log directory full | **Remediation:** ```bash # 1. Check disk usage df -h /var/lib/stellaops # 2. Find large files du -sh /var/lib/stellaops/* | sort -hr | head -20 # 3. Clean up old logs find /var/log/stellaops -name "*.log" -mtime +30 -delete # 4. Clean up old exports stella export cleanup --older-than 30d ``` **Verification:** `stella doctor --check check.runtime.disk.space` --- #### check.runtime.disk.permissions | Property | Value | |----------|-------| | **CheckId** | `check.runtime.disk.permissions` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Fail | | **Tags** | `quick`, `runtime`, `security` | | **What it verifies** | Write permissions on required directories | | **Evidence collected** | Path, expected permissions, actual permissions, owner | | **Failure modes** | Cannot write to data directory, log directory not writable | **Remediation:** ```bash # 1. Check current permissions ls -la /var/lib/stellaops # 2. Fix ownership sudo chown -R stellaops:stellaops /var/lib/stellaops # 3. Fix permissions sudo chmod 755 /var/lib/stellaops sudo chmod 755 /var/log/stellaops # 4. Verify write access sudo -u stellaops touch /var/lib/stellaops/.write-test && rm /var/lib/stellaops/.write-test ``` **Verification:** `stella doctor --check check.runtime.disk.permissions` --- #### check.time.sync | Property | Value | |----------|-------| | **CheckId** | `check.time.sync` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Warn | | **Tags** | `quick`, `runtime` | | **What it verifies** | System clock is synchronized (NTP) | | **Evidence collected** | NTP status, clock offset, sync source | | **Failure modes** | Clock drift >5s, NTP not running, no sync source | **Remediation:** ```bash # 1. Check NTP status timedatectl status # 2. Enable NTP synchronization sudo timedatectl set-ntp true # 3. Force immediate sync sudo systemctl restart systemd-timesyncd # 4. Verify sync status timedatectl timesync-status ``` **Verification:** `stella doctor --check check.time.sync` --- #### check.crypto.profiles | Property | Value | |----------|-------| | **CheckId** | `check.crypto.profiles` | | **Plugin** | `stellaops.doctor.core` | | **Category** | Core | | **Severity** | Fail | | **Tags** | `quick`, `security`, `crypto` | | **What it verifies** | Crypto profile is valid and providers are available | | **Evidence collected** | Active profile, available providers, missing providers | | **Failure modes** | Invalid profile, required provider not available | **Remediation:** ```bash # 1. List available crypto profiles stella crypto profiles list # 2. Validate current profile stella crypto profiles validate # 3. Switch to a different profile if needed stella crypto profiles set --profile default # 4. Install missing providers (if GOST required) # See docs/crypto/gost-setup.md ``` **Verification:** `stella doctor --check check.crypto.profiles` --- ### 9.2 Database Plugin (`stellaops.doctor.database`) #### check.database.connectivity | Property | Value | |----------|-------| | **CheckId** | `check.database.connectivity` | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Fail | | **Tags** | `quick`, `database` | | **What it verifies** | PostgreSQL connection is successful | | **Evidence collected** | Connection string (redacted), latency, server version | | **Failure modes** | Connection refused, authentication failed, timeout | **Remediation:** ```bash # 1. Test connection manually psql "host=localhost dbname=stellaops user=stella_app" -c "SELECT 1" # 2. Check PostgreSQL is running sudo systemctl status postgresql # 3. Check connection settings # Verify pg_hba.conf allows connections sudo cat /etc/postgresql/16/main/pg_hba.conf | grep stellaops # 4. Check firewall sudo ufw status | grep 5432 ``` **Verification:** `stella doctor --check check.database.connectivity` --- #### check.database.version | Property | Value | |----------|-------| | **CheckId** | `check.database.version` | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Warn | | **Tags** | `database` | | **What it verifies** | PostgreSQL version meets minimum requirements (>=16) | | **Evidence collected** | Current version, required version | | **Failure modes** | PostgreSQL <16, unsupported version | **Remediation:** ```bash # 1. Check current version psql -c "SELECT version();" # 2. Upgrade PostgreSQL (Ubuntu) sudo apt install postgresql-16 # 3. Migrate data to new version sudo pg_upgradecluster 14 main # 4. Remove old version sudo apt remove postgresql-14 ``` **Verification:** `stella doctor --check check.database.version` --- #### check.database.migrations.pending | Property | Value | |----------|-------| | **CheckId** | `check.database.migrations.pending` | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Fail | | **Tags** | `database`, `migrations` | | **What it verifies** | No pending release migrations exist | | **Evidence collected** | Schema, current version, pending migrations list | | **Failure modes** | Release migrations not applied before deployment | **Remediation:** ```bash # 1. Backup database first (RECOMMENDED) pg_dump -h localhost -U stella_admin -d stellaops -F c \ -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump # 2. Check migration status for all modules stella system migrations-status # 3. Apply pending release migrations stella system migrations-run --category release # 4. Verify all migrations applied stella system migrations-status --verify ``` **Verification:** `stella doctor --check check.database.migrations.pending` --- #### check.database.migrations.checksum | Property | Value | |----------|-------| | **CheckId** | `check.database.migrations.checksum` | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Fail | | **Tags** | `database`, `migrations`, `security` | | **What it verifies** | Applied migration checksums match source files | | **Evidence collected** | Mismatched migrations, expected vs actual checksum | | **Failure modes** | Migration file modified after application, corruption | **Remediation:** ```bash # CRITICAL: Checksum mismatch indicates potential data integrity issue # 1. Identify mismatched migrations stella system migrations-verify --detailed # 2. If migrations were legitimately modified (rare): # WARNING: Only proceed if you understand the implications stella system migrations-repair --migration {MIGRATION_NAME} --force # 3. If data corruption suspected: # Restore from backup and reapply migrations pg_restore -h localhost -U stella_admin -d stellaops stellaops_backup.dump stella system migrations-run --all ``` **Verification:** `stella doctor --check check.database.migrations.checksum` --- #### check.database.migrations.lock | Property | Value | |----------|-------| | **CheckId** | `check.database.migrations.lock` | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Warn | | **Tags** | `database`, `migrations` | | **What it verifies** | No stale migration locks exist | | **Evidence collected** | Lock holder, lock duration, schema | | **Failure modes** | Abandoned lock from crashed process | **Remediation:** ```bash # 1. Check for active locks psql -d stellaops -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';" # 2. Identify lock holder process psql -d stellaops -c "SELECT pid, query, state FROM pg_stat_activity WHERE pid IN (SELECT pid FROM pg_locks WHERE locktype = 'advisory');" # 3. If process is dead, clear the lock # WARNING: Only if you are certain no migration is running psql -d stellaops -c "SELECT pg_advisory_unlock_all();" # 4. Retry migration stella system migrations-run --category release ``` **Verification:** `stella doctor --check check.database.migrations.lock` --- #### check.database.schema.{schema} | Property | Value | |----------|-------| | **CheckId** | `check.database.schema.{schema}` (e.g., `check.database.schema.auth`) | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Fail | | **Tags** | `database` | | **What it verifies** | Schema exists and has expected tables | | **Evidence collected** | Schema name, expected tables, missing tables | | **Failure modes** | Schema not created, tables dropped | **Remediation:** ```bash # 1. Check if schema exists psql -d stellaops -c "SELECT schema_name FROM information_schema.schemata WHERE schema_name = '{SCHEMA}';" # 2. If schema missing, run startup migrations stella system migrations-run --module {MODULE} --category startup # 3. Verify schema tables psql -d stellaops -c "SELECT table_name FROM information_schema.tables WHERE table_schema = '{SCHEMA}';" ``` **Verification:** `stella doctor --check check.database.schema.{schema}` --- #### check.database.connections.pool | Property | Value | |----------|-------| | **CheckId** | `check.database.connections.pool` | | **Plugin** | `stellaops.doctor.database` | | **Category** | Database | | **Severity** | Warn | | **Tags** | `database`, `performance` | | **What it verifies** | Connection pool is healthy, not exhausted | | **Evidence collected** | Active connections, idle connections, max connections | | **Failure modes** | Pool exhausted, connection leak | **Remediation:** ```bash # 1. Check current connections psql -d stellaops -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'stellaops';" # 2. Check max connections psql -d stellaops -c "SHOW max_connections;" # 3. Identify long-running queries psql -d stellaops -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10;" # 4. Increase max connections if needed # Edit postgresql.conf: max_connections = 200 sudo systemctl reload postgresql ``` **Verification:** `stella doctor --check check.database.connections.pool` --- ### 9.3 Service Graph Plugin (`stellaops.doctor.servicegraph`) #### check.services.gateway.running | Property | Value | |----------|-------| | **CheckId** | `check.services.gateway.running` | | **Plugin** | `stellaops.doctor.servicegraph` | | **Category** | ServiceGraph | | **Severity** | Fail | | **Tags** | `quick`, `services` | | **What it verifies** | Gateway service is running and accepting connections | | **Evidence collected** | Service status, PID, uptime, port binding | | **Failure modes** | Service not running, port already in use | **Remediation:** ```bash # 1. Check service status sudo systemctl status stellaops-gateway # 2. Check logs for errors sudo journalctl -u stellaops-gateway -n 50 # 3. Check port binding sudo ss -tlnp | grep 443 # 4. Start/restart service sudo systemctl restart stellaops-gateway ``` **Verification:** `stella doctor --check check.services.gateway.running` --- #### check.services.gateway.routing | Property | Value | |----------|-------| | **CheckId** | `check.services.gateway.routing` | | **Plugin** | `stellaops.doctor.servicegraph` | | **Category** | ServiceGraph | | **Severity** | Fail | | **Tags** | `services`, `routing` | | **What it verifies** | Gateway can route requests to backend services | | **Evidence collected** | Registered services, routing table, disconnected services | | **Failure modes** | No services registered, all services disconnected | **Remediation:** ```bash # 1. Check registered services curl -s http://localhost:8080/health/routing | jq # 2. Verify backend services are running stella services status # 3. Check Router transport connectivity stella services connectivity-test # 4. Restart disconnected services sudo systemctl restart stellaops-concelier sudo systemctl restart stellaops-scanner ``` **Verification:** `stella doctor --check check.services.gateway.routing` --- #### check.services.{service}.health | Property | Value | |----------|-------| | **CheckId** | `check.services.{service}.health` (e.g., `check.services.concelier.health`) | | **Plugin** | `stellaops.doctor.servicegraph` | | **Category** | ServiceGraph | | **Severity** | Fail | | **Tags** | `services` | | **What it verifies** | Service health endpoint returns healthy | | **Evidence collected** | Health status, dependencies, latency | | **Failure modes** | Service unhealthy, degraded dependencies | **Remediation:** ```bash # 1. Check service health directly curl -s http://localhost:{PORT}/healthz | jq # 2. Check detailed health curl -s http://localhost:{PORT}/health/details | jq # 3. Check service logs sudo journalctl -u stellaops-{SERVICE} -n 100 # 4. Restart service if needed sudo systemctl restart stellaops-{SERVICE} ``` **Verification:** `stella doctor --check check.services.{service}.health` --- #### check.services.{service}.connectivity | Property | Value | |----------|-------| | **CheckId** | `check.services.{service}.connectivity` | | **Plugin** | `stellaops.doctor.servicegraph` | | **Category** | ServiceGraph | | **Severity** | Fail | | **Tags** | `services`, `routing` | | **What it verifies** | Service is reachable from Gateway via Router | | **Evidence collected** | Transport type, connection state, last heartbeat | | **Failure modes** | Connection refused, heartbeat timeout | **Remediation:** ```bash # 1. Check Router connection status stella services connection-status --service {SERVICE} # 2. Test network connectivity nc -zv {SERVICE_HOST} {SERVICE_PORT} # 3. Check firewall rules sudo ufw status | grep {SERVICE_PORT} # 4. Verify Router configuration in service # Check stellaops.yaml for correct Router endpoints ``` **Verification:** `stella doctor --check check.services.{service}.connectivity` --- #### check.services.authority.connectivity | Property | Value | |----------|-------| | **CheckId** | `check.services.authority.connectivity` | | **Plugin** | `stellaops.doctor.servicegraph` | | **Category** | ServiceGraph | | **Severity** | Fail | | **Tags** | `quick`, `services`, `auth` | | **What it verifies** | Authority service is reachable | | **Evidence collected** | Authority URL, response status, latency | | **Failure modes** | Authority unreachable, OIDC discovery failed | **Remediation:** ```bash # 1. Check Authority URL configuration echo $STELLAOPS_AUTHORITY_URL # 2. Test OIDC discovery endpoint curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq # 3. Check Authority service status sudo systemctl status stellaops-authority # 4. Verify network connectivity curl -v ${STELLAOPS_AUTHORITY_URL}/healthz ``` **Verification:** `stella doctor --check check.services.authority.connectivity` --- ### 9.4 Security Plugin (`stellaops.doctor.security`) #### check.auth.oidc.discovery | Property | Value | |----------|-------| | **CheckId** | `check.auth.oidc.discovery` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `quick`, `auth`, `security` | | **What it verifies** | OIDC well-known endpoint is accessible | | **Evidence collected** | Discovery URL, issuer, supported flows | | **Failure modes** | Discovery endpoint unavailable, invalid response | **Remediation:** ```bash # 1. Test discovery endpoint curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq # 2. Verify issuer matches configuration # The issuer in the response should match STELLAOPS_AUTHORITY_URL # 3. Check Authority service logs sudo journalctl -u stellaops-authority -n 50 # 4. Verify TLS certificate openssl s_client -connect auth.stellaops.example.com:443 -servername auth.stellaops.example.com ``` **Verification:** `stella doctor --check check.auth.oidc.discovery` --- #### check.auth.oidc.jwks | Property | Value | |----------|-------| | **CheckId** | `check.auth.oidc.jwks` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `auth`, `security` | | **What it verifies** | JWKS endpoint returns valid signing keys | | **Evidence collected** | JWKS URL, key count, key algorithms | | **Failure modes** | JWKS unavailable, no keys, unsupported algorithms | **Remediation:** ```bash # 1. Fetch JWKS directly curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/jwks.json | jq # 2. Verify keys are present # Response should contain at least one key in "keys" array # 3. If JWKS is empty, regenerate signing keys stella authority keys rotate # 4. Restart Authority service sudo systemctl restart stellaops-authority ``` **Verification:** `stella doctor --check check.auth.oidc.jwks` --- #### check.auth.ldap.bind | Property | Value | |----------|-------| | **CheckId** | `check.auth.ldap.bind` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `auth`, `security`, `ldap` | | **What it verifies** | LDAP bind credentials are valid | | **Evidence collected** | LDAP host, bind DN (redacted), TLS status | | **Failure modes** | Invalid credentials, connection refused, TLS failure | **Remediation:** ```bash # 1. Test LDAP connection with ldapsearch ldapsearch -x -H ldaps://{LDAP_HOST}:636 \ -D "cn=bind-user,ou=service,dc=example,dc=internal" \ -w "{PASSWORD}" \ -b "ou=people,dc=example,dc=internal" "(uid=*)" dn | head -10 # 2. Check TLS certificate openssl s_client -connect {LDAP_HOST}:636 -showcerts # 3. Verify bind DN and password in configuration # Check etc/authority.plugins/ldap.yaml # 4. Test with Authority's ldap-test command stella authority ldap-test --bind-only ``` **Verification:** `stella doctor --check check.auth.ldap.bind` --- #### check.auth.ldap.search | Property | Value | |----------|-------| | **CheckId** | `check.auth.ldap.search` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `auth`, `ldap` | | **What it verifies** | LDAP search base is accessible and returns users | | **Evidence collected** | Search base, user count, search time | | **Failure modes** | Search base not found, no users returned, timeout | **Remediation:** ```bash # 1. Test LDAP search ldapsearch -x -H ldaps://{LDAP_HOST}:636 \ -D "{BIND_DN}" -w "{PASSWORD}" \ -b "{SEARCH_BASE}" "(objectClass=person)" dn | wc -l # 2. Verify search base in configuration # Check etc/authority.plugins/ldap.yaml: connection.searchBase # 3. Check if search base exists ldapsearch -x -H ldaps://{LDAP_HOST}:636 \ -D "{BIND_DN}" -w "{PASSWORD}" \ -b "" -s base "(objectClass=*)" # 4. Verify bind user has read permissions # Check LDAP ACLs ``` **Verification:** `stella doctor --check check.auth.ldap.search` --- #### check.auth.ldap.groups | Property | Value | |----------|-------| | **CheckId** | `check.auth.ldap.groups` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Warn | | **Tags** | `auth`, `ldap` | | **What it verifies** | LDAP group mapping is configured and working | | **Evidence collected** | Group attribute, mapped groups, sample user groups | | **Failure modes** | Group attribute not found, no groups mapped | **Remediation:** ```bash # 1. Check group attribute configuration # etc/authority.plugins/ldap.yaml: claims.groupAttribute # 2. Test group lookup for a sample user ldapsearch -x -H ldaps://{LDAP_HOST}:636 \ -D "{BIND_DN}" -w "{PASSWORD}" \ -b "{SEARCH_BASE}" "(uid={TEST_USER})" memberOf # 3. Verify group mapping in Authority stella authority ldap-test --user {TEST_USER} --show-groups # 4. Update group attribute if needed # Common attributes: memberOf, member, groupMembership ``` **Verification:** `stella doctor --check check.auth.ldap.groups` --- #### check.tls.certificates.expiry | Property | Value | |----------|-------| | **CheckId** | `check.tls.certificates.expiry` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Warn (30d), Fail (7d) | | **Tags** | `quick`, `security`, `tls` | | **What it verifies** | TLS certificates are not expiring soon | | **Evidence collected** | Certificate path, subject, expiry date, days remaining | | **Failure modes** | Certificate expired, expiring within threshold | **Remediation:** ```bash # 1. Check certificate expiry openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -enddate # 2. Renew with certbot (if using Let's Encrypt) sudo certbot renew --cert-name stellaops.example.com # 3. Renew manually (if self-signed or enterprise CA) # Generate new CSR openssl req -new -key /etc/ssl/private/stellaops.key \ -out /tmp/stellaops.csr -subj "/CN=stellaops.example.com" # Submit CSR to CA and install new certificate # 4. Restart services to pick up new certificate sudo systemctl restart stellaops-gateway ``` **Verification:** `stella doctor --check check.tls.certificates.expiry` --- #### check.tls.certificates.chain | Property | Value | |----------|-------| | **CheckId** | `check.tls.certificates.chain` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `security`, `tls` | | **What it verifies** | TLS certificate chain is complete and valid | | **Evidence collected** | Certificate chain, validation errors | | **Failure modes** | Missing intermediate, self-signed not trusted, chain broken | **Remediation:** ```bash # 1. Verify certificate chain openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt \ /etc/ssl/certs/stellaops.crt # 2. Check chain with openssl openssl s_client -connect stellaops.example.com:443 \ -servername stellaops.example.com -showcerts # 3. Download missing intermediate certificates # From your CA's website # 4. Concatenate certificates in correct order cat stellaops.crt intermediate.crt > stellaops-fullchain.crt ``` **Verification:** `stella doctor --check check.tls.certificates.chain` --- #### check.secrets.vault.connectivity | Property | Value | |----------|-------| | **CheckId** | `check.secrets.vault.connectivity` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `security`, `vault` | | **What it verifies** | Vault service is reachable | | **Evidence collected** | Vault address, seal status, version | | **Failure modes** | Vault unreachable, sealed, version mismatch | **Remediation:** ```bash # 1. Check Vault status vault status # 2. If sealed, unseal Vault vault operator unseal {UNSEAL_KEY_1} vault operator unseal {UNSEAL_KEY_2} vault operator unseal {UNSEAL_KEY_3} # 3. Check network connectivity curl -s ${VAULT_ADDR}/v1/sys/health | jq # 4. Verify VAULT_ADDR environment variable echo $VAULT_ADDR ``` **Verification:** `stella doctor --check check.secrets.vault.connectivity` --- #### check.secrets.vault.auth | Property | Value | |----------|-------| | **CheckId** | `check.secrets.vault.auth` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `security`, `vault` | | **What it verifies** | Vault authentication is successful | | **Evidence collected** | Auth method, token TTL, policies | | **Failure modes** | Invalid token, expired token, wrong auth method | **Remediation:** ```bash # 1. Check current token vault token lookup # 2. If token expired, authenticate again # Token auth: vault login {TOKEN} # AppRole auth: vault write auth/approle/login role_id={ROLE_ID} secret_id={SECRET_ID} # Kubernetes auth: vault write auth/kubernetes/login role=stellaops jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token # 3. Verify authentication worked vault token lookup ``` **Verification:** `stella doctor --check check.secrets.vault.auth` --- #### check.secrets.vault.paths | Property | Value | |----------|-------| | **CheckId** | `check.secrets.vault.paths` | | **Plugin** | `stellaops.doctor.security` | | **Category** | Security | | **Severity** | Fail | | **Tags** | `security`, `vault` | | **What it verifies** | Required secret paths are accessible | | **Evidence collected** | Checked paths, accessible paths, denied paths | | **Failure modes** | Permission denied, path not found | **Remediation:** ```bash # 1. Test reading required secrets vault kv get secret/data/stellaops/api-keys # 2. Check policy permissions vault token lookup -format=json | jq '.data.policies' # 3. Review policy rules vault policy read stellaops # 4. Update policy if needed vault policy write stellaops - < /etc/logrotate.d/stellaops << 'EOF' /var/log/stellaops/*.log { daily rotate 14 compress delaycompress missingok notifempty create 640 stellaops stellaops postrotate systemctl reload stellaops-gateway > /dev/null 2>&1 || true endscript } EOF # 3. Test logrotate configuration sudo logrotate -d /etc/logrotate.d/stellaops ``` **Verification:** `stella doctor --check check.logs.rotation.configured` --- #### check.metrics.prometheus.scrape | Property | Value | |----------|-------| | **CheckId** | `check.metrics.prometheus.scrape` | | **Plugin** | `stellaops.doctor.observability` | | **Category** | Observability | | **Severity** | Warn | | **Tags** | `observability`, `metrics` | | **What it verifies** | Prometheus metrics endpoint is accessible | | **Evidence collected** | Metrics endpoint, sample metrics count | | **Failure modes** | Endpoint not exposed, auth required | **Remediation:** ```bash # 1. Check metrics endpoint curl -s http://localhost:{PORT}/metrics | head -20 # 2. Verify metrics are being scraped curl -s http://{PROMETHEUS_HOST}:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job == "stellaops")' # 3. Add Prometheus scrape config # In prometheus.yml: scrape_configs: - job_name: 'stellaops' static_configs: - targets: ['stellaops-gateway:8080', 'stellaops-concelier:8081'] # 4. Reload Prometheus curl -X POST http://{PROMETHEUS_HOST}:9090/-/reload ``` **Verification:** `stella doctor --check check.metrics.prometheus.scrape` --- ### 9.8 Release Orchestrator Plugin (`stellaops.doctor.releaseorch`) #### check.releaseorch.environments.configured | Property | Value | |----------|-------| | **CheckId** | `check.releaseorch.environments.configured` | | **Plugin** | `stellaops.doctor.releaseorch` | | **Category** | Integration | | **Severity** | Fail | | **Tags** | `release`, `environments` | | **What it verifies** | At least one environment is configured | | **Evidence collected** | Environment count, environment names | | **Failure modes** | No environments configured | **Remediation:** ```bash # 1. List current environments stella environments list # 2. Create development environment stella environments create \ --name development \ --type development \ --promotion-target staging # 3. Create staging environment stella environments create \ --name staging \ --type staging \ --promotion-target production \ --requires-approval # 4. Create production environment stella environments create \ --name production \ --type production \ --requires-approval ``` **Verification:** `stella doctor --check check.releaseorch.environments.configured` --- #### check.releaseorch.deployments.targets | Property | Value | |----------|-------| | **CheckId** | `check.releaseorch.deployments.targets` | | **Plugin** | `stellaops.doctor.releaseorch` | | **Category** | Integration | | **Severity** | Fail | | **Tags** | `release`, `deployments` | | **What it verifies** | Deployment targets are reachable | | **Evidence collected** | Target type, connectivity status, last heartbeat | | **Failure modes** | Agent offline, target unreachable | **Remediation:** ```bash # 1. List deployment targets stella deployments targets list # 2. Check agent status stella deployments targets health --target {TARGET_ID} # 3. Restart agent if needed # On target host: sudo systemctl restart stellaops-agent # 4. Re-register target if agent was reinstalled stella deployments targets register \ --name {TARGET_NAME} \ --type docker-compose \ --endpoint ssh://user@host ``` **Verification:** `stella doctor --check check.releaseorch.deployments.targets` --- ## 10. Plugin Implementation Details ### 10.1 Core Platform Plugin **Location:** `src/__Libraries/StellaOps.Doctor/Plugins/Core/` Provides foundational checks for configuration, runtime, and platform health. **Checks Provided:** - `check.config.required` - `check.config.syntax` - `check.config.deprecated` - `check.runtime.dotnet` - `check.runtime.memory` - `check.runtime.disk.space` - `check.runtime.disk.permissions` - `check.time.sync` - `check.crypto.profiles` **Dependencies:** None (core plugin) --- ### 10.2 Database & Migrations Plugin **Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Database/` Provides database connectivity and migration state checks. **References:** - `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs` - `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationStatusService.cs` **Checks Provided:** - `check.database.connectivity` - `check.database.version` - `check.database.migrations.pending` - `check.database.migrations.checksum` - `check.database.migrations.lock` - `check.database.schema.{schema}` (dynamic per schema) - `check.database.connections.pool` **Configuration:** ```yaml Doctor: Plugins: Database: Enabled: true ConnectionTimeout: 10s Schemas: - auth - vuln - scanner - orchestrator ``` --- ### 10.3 Service Graph Plugin **Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ServiceGraph/` Validates inter-service connectivity via Gateway and Router. **References:** - `src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs` - `src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs` **Checks Provided:** - `check.services.gateway.running` - `check.services.gateway.routing` - `check.services.{service}.health` (dynamic per service) - `check.services.{service}.connectivity` (dynamic per service) - `check.services.authority.connectivity` **Configuration:** ```yaml Doctor: Plugins: ServiceGraph: Enabled: true HealthEndpointTimeout: 5s Services: - name: concelier port: 8081 - name: scanner port: 8082 - name: attestor port: 8083 ``` --- ### 10.4 Security Plugin **Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Security/` Validates authentication, authorization, TLS, and secrets management. **References:** - `src/Authority/StellaOps.Authority/StellaOps.Authority.Plugin.Ldap/` - `src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/HashiCorpVaultConnector.cs` **Checks Provided:** - `check.auth.oidc.discovery` - `check.auth.oidc.jwks` - `check.auth.ldap.bind` - `check.auth.ldap.search` - `check.auth.ldap.groups` - `check.tls.certificates.expiry` - `check.tls.certificates.chain` - `check.secrets.vault.connectivity` - `check.secrets.vault.auth` - `check.secrets.vault.paths` --- ### 10.5 SCM Integration Plugins **GitHub Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitHub/` **GitLab Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitLab/` **References:** - `src/Integrations/__Plugins/StellaOps.Integrations.Plugin.GitHubApp/` - `etc/scm-connectors/github.yaml` **GitHub Checks:** - `check.integration.scm.github.connectivity` - `check.integration.scm.github.auth` - `check.integration.scm.github.permissions` - `check.integration.scm.github.ratelimit` **GitLab Checks:** - `check.integration.scm.gitlab.connectivity` - `check.integration.scm.gitlab.auth` - `check.integration.scm.gitlab.permissions` --- ### 10.6 Registry Integration Plugins **Harbor Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.Harbor/` **ECR Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.ECR/` **References:** - `src/Integrations/__Plugins/StellaOps.Integrations.Plugin.Harbor/` **Harbor Checks:** - `check.integration.registry.harbor.connectivity` - `check.integration.registry.harbor.auth` - `check.integration.registry.harbor.pull` **ECR Checks:** - `check.integration.registry.ecr.connectivity` - `check.integration.registry.ecr.pull` --- ### 10.7 Observability Plugin **Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Observability/` **References:** - `devops/telemetry/otel-collector.yaml` **Checks Provided:** - `check.telemetry.otlp.endpoint` - `check.logs.directory.writable` - `check.logs.rotation.configured` - `check.metrics.prometheus.scrape` --- ### 10.8 Release Orchestrator Plugin **Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ReleaseOrch/` **References:** - `src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/` **Checks Provided:** - `check.releaseorch.environments.configured` - `check.releaseorch.deployments.targets` --- ## Appendix A: Complete Check ID Reference | CheckId | Plugin | Category | Default Severity | |---------|--------|----------|------------------| | `check.config.required` | core | Core | Fail | | `check.config.syntax` | core | Core | Fail | | `check.config.deprecated` | core | Core | Warn | | `check.runtime.dotnet` | core | Core | Fail | | `check.runtime.memory` | core | Core | Warn | | `check.runtime.disk.space` | core | Core | Warn | | `check.runtime.disk.permissions` | core | Core | Fail | | `check.time.sync` | core | Core | Warn | | `check.crypto.profiles` | core | Core | Fail | | `check.database.connectivity` | database | Database | Fail | | `check.database.version` | database | Database | Warn | | `check.database.migrations.pending` | database | Database | Fail | | `check.database.migrations.checksum` | database | Database | Fail | | `check.database.migrations.lock` | database | Database | Warn | | `check.database.schema.{schema}` | database | Database | Fail | | `check.database.connections.pool` | database | Database | Warn | | `check.services.gateway.running` | servicegraph | ServiceGraph | Fail | | `check.services.gateway.routing` | servicegraph | ServiceGraph | Fail | | `check.services.{service}.health` | servicegraph | ServiceGraph | Fail | | `check.services.{service}.connectivity` | servicegraph | ServiceGraph | Fail | | `check.services.authority.connectivity` | servicegraph | ServiceGraph | Fail | | `check.auth.oidc.discovery` | security | Security | Fail | | `check.auth.oidc.jwks` | security | Security | Fail | | `check.auth.ldap.bind` | security | Security | Fail | | `check.auth.ldap.search` | security | Security | Fail | | `check.auth.ldap.groups` | security | Security | Warn | | `check.tls.certificates.expiry` | security | Security | Warn/Fail | | `check.tls.certificates.chain` | security | Security | Fail | | `check.secrets.vault.connectivity` | security | Security | Fail | | `check.secrets.vault.auth` | security | Security | Fail | | `check.secrets.vault.paths` | security | Security | Fail | | `check.integration.scm.github.connectivity` | scm.github | Integration | Fail | | `check.integration.scm.github.auth` | scm.github | Integration | Fail | | `check.integration.scm.github.permissions` | scm.github | Integration | Fail | | `check.integration.scm.github.ratelimit` | scm.github | Integration | Warn | | `check.integration.scm.gitlab.connectivity` | scm.gitlab | Integration | Fail | | `check.integration.scm.gitlab.auth` | scm.gitlab | Integration | Fail | | `check.integration.registry.harbor.connectivity` | registry.harbor | Integration | Fail | | `check.integration.registry.harbor.auth` | registry.harbor | Integration | Fail | | `check.integration.registry.harbor.pull` | registry.harbor | Integration | Fail | | `check.integration.registry.ecr.connectivity` | registry.ecr | Integration | Fail | | `check.integration.registry.ecr.pull` | registry.ecr | Integration | Fail | | `check.telemetry.otlp.endpoint` | observability | Observability | Warn | | `check.logs.directory.writable` | observability | Observability | Fail | | `check.logs.rotation.configured` | observability | Observability | Warn | | `check.metrics.prometheus.scrape` | observability | Observability | Warn | | `check.releaseorch.environments.configured` | releaseorch | Integration | Fail | | `check.releaseorch.deployments.targets` | releaseorch | Integration | Fail | --- ## Appendix B: Quick Reference - Common Issues ### Database Issues ```bash # Connection refused sudo systemctl start postgresql stella doctor --check check.database.connectivity # Pending migrations stella system migrations-run --category release stella doctor --check check.database.migrations.pending # Migration lock stuck psql -d stellaops -c "SELECT pg_advisory_unlock_all();" ``` ### Authentication Issues ```bash # OIDC discovery fails curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration sudo systemctl restart stellaops-authority # LDAP bind fails ldapsearch -x -H ldaps://{HOST}:636 -D "{BIND_DN}" -w "{PASSWORD}" -b "" -s base ``` ### Integration Issues ```bash # GitHub rate limit curl -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit # Harbor connectivity curl -s https://{HARBOR_HOST}/api/v2.0/health | jq ``` --- *Document generated: 2026-01-12* *Stella Ops Doctor Capability Specification v1.0.0-draft*