Files
git.stella-ops.org/docs/doctor/doctor-capabilities.md
2026-01-14 18:39:19 +02:00

106 KiB

Stella Ops Doctor Capability Specification

Status: Planning / Capability Design Version: 1.0.0-draft Last Updated: 2026-01-12


Table of Contents

  1. Executive Summary
  2. Current State Analysis
  3. Doctor Architecture
  4. Plugin System Specification
  5. CLI Surface
  6. UI Surface
  7. API Surface
  8. Remediation Command Patterns
  9. Doctor Check Catalog
  10. Plugin Implementation Details

1. Executive Summary

1.1 Purpose

The Doctor capability provides comprehensive self-service diagnostics for Stella Ops deployments. It enables operators, DevOps engineers, and developers to:

  • Diagnose what is working and what is not
  • Understand why failures occur with collected evidence
  • Remediate issues with copy/paste commands
  • Verify fixes with re-runnable checks

1.2 Target Users

User Type Primary Use Case
Operators Pre-deployment validation, incident triage, routine health checks
DevOps Engineers Integration setup, migration management, environment troubleshooting
Developers Local development environment validation, API connectivity testing
Support Engineers Remote diagnostics, evidence collection for escalation

1.3 Key Principles

  1. Plugin-First Architecture - All checks implemented via extensible plugins
  2. Actionable Remediation - Every failure includes copy/paste fix commands
  3. Zero Docs Familiarity - Users can diagnose and fix without reading documentation
  4. Evidence-Based Diagnostics - All checks collect and report evidence
  5. Multi-Surface Consistency - Same check engine powers CLI, UI, and API
  6. Non-Destructive Fixes - Doctor never executes destructive actions; fix commands must be safe and idempotent

1.4 Surfaces

Surface Entry Point Primary Use
CLI stella doctor Automation, CI/CD gates, SSH troubleshooting
UI /ops/doctor Interactive diagnosis, team collaboration
API POST /api/v1/doctor/run Programmatic integration, monitoring systems

2. Current State Analysis

2.1 CLI - Current State

Location: src/Cli/StellaOps.Cli/

What Exists Today

Component File Path Description
Entry Point src/Cli/StellaOps.Cli/Program.cs Main CLI bootstrap using System.CommandLine
Command Factory src/Cli/StellaOps.Cli/Commands/CommandFactory.cs Registers 88+ command groups
Config Bootstrap src/Cli/StellaOps.Cli/Configuration/CliBootstrapper.cs Environment + YAML/JSON config loading
Exit Codes src/Cli/StellaOps.Cli/CliExitCodes.cs Standardized exit codes (0-99)
Crypto Validator src/Cli/StellaOps.Cli/Services/CryptoProfileValidator.cs Startup validation for crypto profiles
Migration Commands src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs migrations-run, migrations-status, migrations-verify

Existing Validation Patterns

// CryptoProfileValidator.cs - Startup validation pattern
public sealed record ValidationResult
{
    public bool IsValid { get; init; }
    public bool HasWarnings { get; init; }
    public bool HasErrors { get; init; }
    public List<string> Errors { get; init; }
    public List<string> Warnings { get; init; }
    public string ActiveProfile { get; init; }
    public List<string> AvailableProviders { get; init; }
}

Gaps

  • No unified stella doctor command
  • Output formatting is ad-hoc per command (no centralized formatter)
  • No remediation command generation
  • Validation only for crypto profiles, not comprehensive system state

Proposed Capability

# Quick system health check
stella doctor

# Full diagnostic with all checks
stella doctor --full

# Check specific category
stella doctor --category database
stella doctor --category integrations

# Check specific plugin
stella doctor --plugin scm.github

# Run single check
stella doctor --check check.database.migrations.pending

# Output formats
stella doctor --format json
stella doctor --format markdown
stella doctor --format text

# Export report
stella doctor --export report.json
stella doctor --export report.md

# Filter by severity
stella doctor --severity fail,warn

2.2 Health Infrastructure - Current State

Pattern: Extensive health endpoints across 20+ services

What Exists Today

Component File Path Description
Health Status Enum src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthStatus.cs Unknown, Healthy, Degraded, Unhealthy
Health Check Result src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs Rich result with factory methods
Gateway Health src/Gateway/StellaOps.Gateway.WebService/Middleware/HealthCheckMiddleware.cs /health/live, /health/ready, /health/startup
Scanner Health src/Scanner/StellaOps.Scanner.WebService/Endpoints/HealthEndpoints.cs /healthz, /readyz
Orchestrator Health src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/HealthEndpoints.cs /health/details
Platform Health src/Platform/__Libraries/StellaOps.Platform.Health/PlatformHealthService.cs Cross-service aggregation
Health Contract devops/docker/health-endpoints.md Formal endpoint specification

Health Check Result Model

// From src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs
public sealed record HealthCheckResult(
    HealthStatus Status,
    string? Message,
    IReadOnlyDictionary<string, string>? Details,
    DateTimeOffset CheckedAt,
    TimeSpan Duration)
{
    public static HealthCheckResult Healthy(string? message = null) => ...
    public static HealthCheckResult Degraded(string message) => ...
    public static HealthCheckResult Unhealthy(string message, Exception? ex = null) => ...
}

Gaps

  • Health endpoints check liveness/readiness, not comprehensive diagnostics
  • No remediation guidance in health responses
  • No aggregated cross-service diagnostic view
  • Health checks don't verify configuration validity

2.3 Doctor Service - Current State (ReleaseOrchestrator)

Location: src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/

What Exists Today

Component File Path Description
Doctor Service Doctor/DoctorService.cs Runs IDoctorCheck implementations
Doctor Report Doctor/DoctorReport.cs Aggregated results with counts
Check Result Doctor/CheckResult.cs Individual check outcome
IDoctorCheck Doctor/IDoctorCheck.cs Plugin interface for checks

IDoctorCheck Interface

// Existing interface (simplified)
public interface IDoctorCheck
{
    string Name { get; }
    string Category { get; }
    Task<CheckResult> RunAsync(CancellationToken ct);
}

public sealed record CheckResult(
    string Name,
    HealthStatus Status,
    string? Message,
    TimeSpan Duration);

public sealed record DoctorReport(
    int PassCount,
    int WarningCount,
    int FailCount,
    int SkippedCount,
    HealthStatus OverallStatus,
    TimeSpan TotalDuration,
    IReadOnlyList<CheckResult> Results);

Gaps

  • Only available in ReleaseOrchestrator, not CLI or other modules
  • No remediation commands in output
  • No evidence collection
  • Limited to integration checks only
  • No plugin discovery mechanism

2.4 Integration Plugins - Current State

Location: src/Integrations/

What Exists Today

Component File Path Description
Plugin Contract __Libraries/StellaOps.Integrations.Contracts/IIntegrationConnectorPlugin.cs Core plugin interface
Integration Types __Libraries/StellaOps.Integrations.Contracts/IntegrationType.cs Registry, SCM, CI/CD, etc.
GitHub Plugin __Plugins/StellaOps.Integrations.Plugin.GitHubApp/GitHubAppConnectorPlugin.cs GitHub App integration
Harbor Plugin __Plugins/StellaOps.Integrations.Plugin.Harbor/HarborConnectorPlugin.cs Harbor registry
Plugin Loader StellaOps.Integrations.WebService/IntegrationPluginLoader.cs Assembly-based discovery
Vault Connectors src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/ HashiCorp Vault, Azure Key Vault

IIntegrationConnectorPlugin Interface

public interface IIntegrationConnectorPlugin : IAvailabilityPlugin
{
    IntegrationType Type { get; }
    IntegrationProvider Provider { get; }
    string Name { get; }

    Task<TestConnectionResult> TestConnectionAsync(
        IntegrationConfig config,
        CancellationToken ct);

    Task<HealthCheckResult> CheckHealthAsync(
        IntegrationConfig config,
        CancellationToken ct);
}

Supported Integration Types

public enum IntegrationType
{
    Registry = 1,      // Harbor, ECR, GCR, ACR, Docker Hub, Quay, Artifactory
    Scm = 2,           // GitHub, GitLab, Bitbucket, Gitea, Azure DevOps
    CiCd = 3,          // GitHub Actions, GitLab CI, Jenkins, CircleCI
    RepoSource = 4,    // npm, PyPI, Maven, NuGet, Crates.io
    RuntimeHost = 5,   // eBPF, ETW, dyld agents
    FeedMirror = 6     // NVD, OSV, StellaOps mirrors
}

Gaps

  • TestConnectionAsync exists but not surfaced via CLI doctor
  • No standardized remediation output
  • Health checks don't report required permissions/scopes
  • No validation of webhook/event delivery configuration

2.5 Authority Plugins - Current State

Location: src/Authority/StellaOps.Authority/

What Exists Today

Component File Path Description
Plugin Abstractions StellaOps.Authority.Plugins.Abstractions/ Plugin registration interface
LDAP Plugin StellaOps.Authority.Plugin.Ldap/ LDAP/AD integration
OIDC Plugin StellaOps.Authority.Plugin.Oidc/ OpenID Connect
SAML Plugin StellaOps.Authority.Plugin.Saml/ SAML 2.0
Plugin Registry StellaOps.Authority/AuthorityPluginRegistry.cs Manages named plugins
LDAP Config etc/authority.plugins/ldap.yaml Sample configuration

LDAP Plugin Capabilities

# From etc/authority.plugins/ldap.yaml
connection:
  host: "ldaps://ldap.example.internal"
  port: 636
  searchBase: "ou=people,dc=example,dc=internal"
  bindDn: "cn=bind-user,ou=service,dc=example,dc=internal"
  bindPasswordSecret: "file:/etc/secrets/ldap-bind.txt"
security:
  requireTls: true
claims:
  groupAttribute: "memberOf"
  cache:
    enabled: true
    ttlSeconds: 600

Gaps

  • No CLI command to validate LDAP configuration
  • Health checks exist but don't provide remediation
  • No validation of group mapping correctness
  • TLS certificate validation not exposed as diagnostic

2.6 Database & Migrations - Current State

Location: src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/

What Exists Today

Component File Path Description
Migration Runner Migrations/MigrationRunner.cs Executes SQL migrations with advisory locks
Migration Category Migrations/MigrationCategory.cs Startup, Release, Seed, Data
Status Service Migrations/MigrationStatusService.cs Query migration state
CLI Commands src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs migrations-run/status/verify
Strategy Docs docs/db/MIGRATION_STRATEGY.md Migration process documentation

Migration Categories

Prefix Category Automatic Breaking
001-099 Startup Yes No
100-199 Release No (CLI) Yes
S001-S999 Seed Yes No
DM001-DM999 Data Background Varies

Schema Tracking

CREATE TABLE {schema}.schema_migrations (
    migration_name TEXT PRIMARY KEY,
    category TEXT NOT NULL DEFAULT 'startup',
    checksum TEXT NOT NULL,
    applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    applied_by TEXT,
    duration_ms INT
);

Gaps

  • Migration status not integrated with doctor
  • No checksum mismatch diagnostics with remediation
  • Lock contention not diagnosed
  • No cross-schema migration state view

2.7 UI - Current State

Location: src/Web/StellaOps.Web/

What Exists Today

Component File Path Description
Routes src/app/app.routes.ts Angular Router configuration
Platform Health src/app/features/platform-health/ Health dashboard at /ops/health
Health Client src/app/core/api/platform-health.client.ts API client for health endpoints
Console Status src/app/features/console/console-status.component.ts Queue/run status

Platform Health Dashboard Features

  • Real-time KPI strip (services, latency, error rate, incidents)
  • Service health grid with grouping (healthy/degraded/unhealthy)
  • Dependency graph visualization
  • Incident timeline (last 24h)
  • Auto-refresh every 10 seconds

Gaps

  • No diagnostic check execution from UI
  • No remediation command display
  • No evidence collection/export
  • Health dashboard shows status, not actionable diagnostics

2.8 Service Connectivity - Current State

Location: src/Gateway/, src/Router/

What Exists Today

Component File Path Description
Gateway Routing src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs HTTP to microservice routing
Connection Manager src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs HELLO handshake, heartbeats
Routing State src/Router/__Libraries/StellaOps.Router.Common/Abstractions/IGlobalRoutingState.cs Live service connections
Claims Propagation src/Gateway/StellaOps.Gateway.WebService/Middleware/ClaimsPropagationMiddleware.cs OAuth claims forwarding

Service Registration Flow

  1. Service connects to Gateway via Router transport (TCP/TLS/Valkey)
  2. HELLO handshake with endpoint/schema declarations
  3. Periodic heartbeats with health/latency metrics
  4. Gateway maintains ConnectionState for routing decisions

Gaps

  • No CLI command to verify service graph health
  • Routing failures not diagnosed with remediation
  • No validation of claims propagation configuration
  • Transport connectivity not exposed as diagnostic

3. Doctor Architecture

3.1 High-Level Architecture

+------------------+     +------------------+     +------------------+
|       CLI        |     |        UI        |     |    External      |
|  stella doctor   |     |   /ops/doctor    |     |   Monitoring     |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         v                        v                        v
+------------------------------------------------------------------------+
|                         Doctor API Layer                                |
|  POST /api/v1/doctor/run    GET /api/v1/doctor/checks                  |
|  GET /api/v1/doctor/report  WebSocket /api/v1/doctor/stream            |
+------------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------------+
|                      Doctor Engine (Core)                               |
|  +------------------+  +------------------+  +------------------+       |
|  | Check Registry   |  | Check Executor   |  | Report Generator |       |
|  | - Discovery      |  | - Parallel exec  |  | - JSON/MD/Text   |       |
|  | - Filtering      |  | - Timeout mgmt   |  | - Remediation    |       |
|  +------------------+  +------------------+  +------------------+       |
+------------------------------------------------------------------------+
         |
         v
+------------------------------------------------------------------------+
|                        Plugin System                                    |
+--------+---------+---------+---------+---------+---------+-------------+
         |         |         |         |         |         |
         v         v         v         v         v         v
+--------+  +------+  +------+  +------+  +------+  +------+  +----------+
| Core   |  | DB & |  |Service|  | SCM  |  |Regis-|  | Vault|  | Authority|
| Plugin |  |Migra-|  | Graph |  |Plugin|  | try  |  |Plugin|  | Plugin   |
|        |  | tions|  |Plugin |  |      |  |Plugin|  |      |  |          |
+--------+  +------+  +------+  +------+  +------+  +------+  +----------+

3.2 Core Components

Doctor Engine

Proposed Location: src/__Libraries/StellaOps.Doctor/

StellaOps.Doctor/
├── Engine/
│   ├── DoctorEngine.cs              # Main orchestrator
│   ├── CheckExecutor.cs             # Parallel check execution
│   └── CheckRegistry.cs             # Plugin discovery & filtering
├── Models/
│   ├── DoctorCheckResult.cs         # Extended check result with evidence
│   ├── DoctorReport.cs              # Full report model
│   ├── Remediation.cs               # Fix command model
│   └── Evidence.cs                  # Collected evidence model
├── Plugins/
│   ├── IDoctorPlugin.cs             # Plugin interface
│   ├── IDoctorCheck.cs              # Check interface (extended)
│   └── DoctorPluginContext.cs       # Plugin execution context
├── Output/
│   ├── JsonReportFormatter.cs       # JSON output
│   ├── MarkdownReportFormatter.cs   # Markdown output
│   └── TextReportFormatter.cs       # Console text output
└── DoctorServiceExtensions.cs       # DI registration

Check Execution Model

public sealed class CheckExecutor
{
    private readonly IEnumerable<IDoctorPlugin> _plugins;
    private readonly TimeProvider _timeProvider;
    private readonly ILogger<CheckExecutor> _logger;

    public async Task<DoctorReport> RunAsync(
        DoctorRunOptions options,
        CancellationToken ct)
    {
        var checks = GetFilteredChecks(options);
        var results = new ConcurrentBag<DoctorCheckResult>();

        // Parallel execution with configurable concurrency
        await Parallel.ForEachAsync(
            checks,
            new ParallelOptions
            {
                MaxDegreeOfParallelism = options.Parallelism,
                CancellationToken = ct
            },
            async (check, token) =>
            {
                var result = await ExecuteCheckAsync(check, options, token);
                results.Add(result);
            });

        return GenerateReport(results, options);
    }
}

3.3 Result Model

public sealed record DoctorCheckResult
{
    // Identity
    public required string CheckId { get; init; }
    public required string PluginId { get; init; }
    public required string Category { get; init; }

    // Outcome
    public required DoctorSeverity Severity { get; init; }  // Pass, Warn, Fail, Skip
    public required string Diagnosis { get; init; }

    // Evidence
    public required Evidence Evidence { get; init; }

    // Remediation
    public IReadOnlyList<string>? LikelyCauses { get; init; }
    public Remediation? Remediation { get; init; }
    public string? VerificationCommand { get; init; }

    // Metadata
    public required TimeSpan Duration { get; init; }
    public required DateTimeOffset ExecutedAt { get; init; }
}

public enum DoctorSeverity
{
    Pass = 0,
    Info = 1,
    Warn = 2,
    Fail = 3,
    Skip = 4
}

public sealed record Evidence
{
    public required string Description { get; init; }
    public required IReadOnlyDictionary<string, string> Data { get; init; }
    public IReadOnlyList<string>? SensitiveKeys { get; init; }  // Keys to redact in output
}

public sealed record Remediation
{
    public required IReadOnlyList<RemediationStep> Steps { get; init; }
    public string? SafetyNote { get; init; }
    public bool RequiresBackup { get; init; }
}

public sealed record RemediationStep
{
    public required int Order { get; init; }
    public required string Description { get; init; }
    public required string Command { get; init; }
    public CommandType CommandType { get; init; }  // Shell, SQL, API, FileEdit
    public IReadOnlyDictionary<string, string>? Placeholders { get; init; }
}

public enum CommandType
{
    Shell,      // Bash/PowerShell command
    SQL,        // SQL statement
    API,        // API call (curl/stella CLI)
    FileEdit,   // File modification
    Manual      // Manual step (no command)
}

4. Plugin System Specification

4.1 Plugin Interface

/// <summary>
/// Base interface for Doctor plugins.
/// Plugins group related checks and share configuration context.
/// </summary>
public interface IDoctorPlugin
{
    /// <summary>Unique plugin identifier (e.g., "stellaops.doctor.database")</summary>
    string PluginId { get; }

    /// <summary>Human-readable name</summary>
    string DisplayName { get; }

    /// <summary>Plugin category for filtering</summary>
    DoctorCategory Category { get; }

    /// <summary>Plugin version for compatibility</summary>
    Version Version { get; }

    /// <summary>Minimum Doctor engine version required</summary>
    Version MinEngineVersion { get; }

    /// <summary>Check if plugin is available in current environment</summary>
    bool IsAvailable(IServiceProvider services);

    /// <summary>Get all checks provided by this plugin</summary>
    IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context);

    /// <summary>Initialize plugin with configuration</summary>
    Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
}

public enum DoctorCategory
{
    Core,           // Platform, config, runtime
    Database,       // Schema, migrations, connectivity
    ServiceGraph,   // Inter-service communication
    Integration,    // External system integrations
    Security,       // Auth, TLS, secrets
    Observability   // Logs, metrics, traces
}

4.2 Check Interface

/// <summary>
/// Individual diagnostic check.
/// </summary>
public interface IDoctorCheck
{
    /// <summary>Unique check identifier (e.g., "check.database.migrations.pending")</summary>
    string CheckId { get; }

    /// <summary>Human-readable name</summary>
    string Name { get; }

    /// <summary>What this check verifies</summary>
    string Description { get; }

    /// <summary>Default severity if check fails</summary>
    DoctorSeverity DefaultSeverity { get; }

    /// <summary>Tags for filtering (e.g., ["quick", "security", "migration"])</summary>
    IReadOnlyList<string> Tags { get; }

    /// <summary>Estimated execution time</summary>
    TimeSpan EstimatedDuration { get; }

    /// <summary>Check if this check can run in current context</summary>
    bool CanRun(DoctorPluginContext context);

    /// <summary>Execute the check</summary>
    Task<DoctorCheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
}

4.3 Plugin Context

public sealed class DoctorPluginContext
{
    public required IServiceProvider Services { get; init; }
    public required IConfiguration Configuration { get; init; }
    public required TimeProvider TimeProvider { get; init; }
    public required ILogger Logger { get; init; }

    // Runtime info
    public required string EnvironmentName { get; init; }  // Development, Staging, Production
    public required string? TenantId { get; init; }

    // Plugin configuration
    public required JsonElement PluginConfig { get; init; }

    // Evidence helpers
    public EvidenceBuilder CreateEvidence() => new();
    public RemediationBuilder CreateRemediation() => new();

    // Secret redaction
    public string Redact(string value) => "***REDACTED***";
    public string RedactConnectionString(string cs) => /* redact password */;
}

4.4 Plugin Discovery

Static Discovery (Build-time)

Plugins register via DI at startup:

// In Program.cs or startup
services.AddDoctorPlugin<CoreDoctorPlugin>();
services.AddDoctorPlugin<DatabaseDoctorPlugin>();
services.AddDoctorPlugin<ServiceGraphDoctorPlugin>();
services.AddDoctorPlugin<ScmGitHubDoctorPlugin>();
// ...

Dynamic Discovery (Runtime)

Plugins can be loaded from assemblies:

// In DoctorPluginLoader.cs
public class DoctorPluginLoader
{
    public IEnumerable<IDoctorPlugin> LoadFromDirectory(string path)
    {
        foreach (var dll in Directory.GetFiles(path, "StellaOps.Doctor.Plugin.*.dll"))
        {
            var assembly = Assembly.LoadFrom(dll);
            foreach (var type in assembly.GetTypes()
                .Where(t => typeof(IDoctorPlugin).IsAssignableFrom(t) && !t.IsAbstract))
            {
                yield return (IDoctorPlugin)Activator.CreateInstance(type)!;
            }
        }
    }
}

4.5 Declarative Doctor Packs (YAML)

Doctor packs provide declarative checks that wrap CLI commands and parsing rules. They complement compiled plugins and are loaded from plugins/doctor/*.yaml (plus optional override directories).

Short example:

apiVersion: stella.ops/doctor.v1
kind: DoctorPlugin
metadata:
  name: doctor-release-orchestrator-gitlab
spec:
  discovery:
    when:
      - env: GITLAB_URL

Full sample: docs/benchmarks/doctor/doctor-plugin-release-orchestrator-gitlab.yaml

Key fields:

  • spec.discovery.when: env/file existence gates.
  • checks[].run.exec: command to execute (must be deterministic).
  • checks[].parse.expect or checks[].parse.expectJson: pass/fail rules.
  • checks[].how_to_fix.commands[]: exact fix commands printed verbatim.

4.6 Plugin Directory Structure

src/
├── __Libraries/
│   └── StellaOps.Doctor/                    # Core doctor engine
│       └── Plugins/
│           └── Core/                         # Built-in core plugin
├── Doctor/
│   └── __Plugins/
│       ├── StellaOps.Doctor.Plugin.Database/
│       ├── StellaOps.Doctor.Plugin.ServiceGraph/
│       ├── StellaOps.Doctor.Plugin.Scm.GitHub/
│       ├── StellaOps.Doctor.Plugin.Scm.GitLab/
│       ├── StellaOps.Doctor.Plugin.Registry.Harbor/
│       ├── StellaOps.Doctor.Plugin.Registry.ECR/
│       ├── StellaOps.Doctor.Plugin.Vault/
│       ├── StellaOps.Doctor.Plugin.Authority/
│       └── StellaOps.Doctor.Plugin.Observability/

4.7 Plugin Configuration

Plugins read configuration from the standard config hierarchy:

# In stellaops.yaml or environment-specific config
Doctor:
  Enabled: true
  DefaultTimeout: 30s
  Parallelism: 4

  Plugins:
    Database:
      Enabled: true
      ConnectionTimeout: 10s

    ServiceGraph:
      Enabled: true
      HealthEndpointTimeout: 5s

    Scm:
      GitHub:
        Enabled: true
        RateLimitThreshold: 100

    Registry:
      Harbor:
        Enabled: true
        SkipTlsVerify: false

    Vault:
      Enabled: true
      SecretsToValidate:
        - "secret/data/stellaops/api-keys"
        - "secret/data/stellaops/certificates"

4.8 Security Model

Secret Redaction

All evidence output is sanitized:

public sealed class EvidenceBuilder
{
    private readonly Dictionary<string, string> _data = new();
    private readonly List<string> _sensitiveKeys = new();

    public EvidenceBuilder Add(string key, string value)
    {
        _data[key] = value;
        return this;
    }

    public EvidenceBuilder AddSensitive(string key, string value)
    {
        _data[key] = value;
        _sensitiveKeys.Add(key);
        return this;
    }

    public EvidenceBuilder AddConnectionString(string key, string connectionString)
    {
        // Parse and redact password
        var redacted = RedactConnectionStringPassword(connectionString);
        _data[key] = redacted;
        return this;
    }
}

RBAC Permissions

Doctor checks require specific scopes:

Scope Description
doctor:run Execute doctor checks
doctor:run:full Execute all checks including sensitive
doctor:export Export diagnostic reports
admin:system Access system-level checks

4.9 Versioning Strategy

  • Engine version: Semantic versioning (e.g., 1.0.0)
  • Plugin version: Independent semantic versioning
  • Compatibility: Plugins declare MinEngineVersion
  • Check IDs: Stable across versions (never renamed)
// Version compatibility check
if (plugin.MinEngineVersion > DoctorEngine.Version)
{
    _logger.LogWarning(
        "Plugin {PluginId} requires engine {Required}, current is {Current}. Skipping.",
        plugin.PluginId, plugin.MinEngineVersion, DoctorEngine.Version);
    continue;
}

5. CLI Surface

5.1 Command Structure

Proposed Location: src/Cli/StellaOps.Cli/Commands/DoctorCommandGroup.cs

stella doctor run [options]
stella doctor list [options]
stella doctor fix --from report.json [--apply]

Note: stella doctor remains shorthand for stella doctor run for compatibility.

stella doctor fix executes only non-destructive commands. Any destructive step must be presented as manual guidance and is not eligible for --apply.

5.2 Options and Flags

Option Short Type Default Description
--format -f enum text Output format: text, table, json, markdown
--quick -q flag false Run only quick checks (tagged quick)
--full flag false Run all checks including slow/intensive
--pack string[] all Filter by pack name (manifest grouping)
--category -c string[] all Filter by category: core, database, service-graph, integration, security, observability
--plugin -p string[] all Filter by plugin ID (e.g., scm.github)
--check string Run single check by ID
--severity -s enum[] all Filter output by severity: pass, info, warn, fail
--export -e path Export report to file
--timeout -t duration 30s Per-check timeout
--parallel int 4 Max parallel check execution
--no-remediation flag false Skip remediation command generation
--verbose -v flag false Include detailed evidence in output
--tenant string Tenant context for multi-tenant checks

Fix Options

Option Type Default Description
--from path required Path to JSON report with how_to_fix commands
--apply flag false Execute fixes (default is dry-run preview)

Only commands marked safe and non-destructive are eligible for --apply. Destructive changes must be printed as manual steps and executed by the operator outside Doctor.

5.3 Exit Codes

Code Meaning
0 All checks passed
1 One or more warnings
2 One or more failures
3 Doctor engine error
4 Invalid arguments
5 Timeout exceeded

5.4 Usage Examples

# Quick health check (alias)
stella doctor

# Run all checks explicitly
stella doctor run

# Full diagnostic
stella doctor --full

# Check only database category
stella doctor --category database

# Check specific integration
stella doctor --plugin scm.github

# Run single check
stella doctor --check check.database.migrations.pending

# JSON output for CI/CD
stella doctor --format json --severity fail,warn

# Run orchestrator pack with table output
stella doctor run --pack orchestrator --format table

# Apply fixes from a JSON report (dry-run unless --apply)
stella doctor fix --from out.json --apply

# Export markdown report
stella doctor --full --format markdown --export doctor-report.md

# Verbose with all evidence
stella doctor --verbose --full

# Quick check with 60s timeout
stella doctor --quick --timeout 60s

5.5 Text Output Format

Stella Ops Doctor
=================

Running 47 checks across 8 plugins...

[PASS] check.config.required
  All required configuration values are present

[PASS] check.database.connectivity
  PostgreSQL connection successful (latency: 12ms)

[WARN] check.tls.certificates.expiry
  Diagnosis: TLS certificate expires in 14 days

  Evidence:
    Certificate: /etc/ssl/certs/stellaops.crt
    Subject: CN=stellaops.example.com
    Expires: 2026-01-26T00:00:00Z
    Days remaining: 14

  Likely Causes:
    1. Certificate renewal not scheduled
    2. ACME/Let's Encrypt automation not configured

  Fix Steps:
    # 1. Check current certificate
    openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates

    # 2. Renew certificate (if using certbot)
    sudo certbot renew --cert-name stellaops.example.com

    # 3. Restart services to pick up new certificate
    sudo systemctl restart stellaops-gateway

  Verification:
    stella doctor --check check.tls.certificates.expiry

[FAIL] check.database.migrations.pending
  Diagnosis: 3 pending release migrations detected in schema 'auth'

  Evidence:
    Schema: auth
    Current version: 099_add_dpop_thumbprints
    Pending migrations:
      - 100_add_tenant_quotas
      - 101_add_audit_retention
      - 102_add_session_revocation
    Connection: postgres://localhost:5432/stellaops (user: stella_app)

  Likely Causes:
    1. Release migrations not applied before deployment
    2. Migration files added after last deployment

  Fix Steps:
    # 1. Backup database first (RECOMMENDED)
    pg_dump -h localhost -U stella_admin -d stellaops -F c \
      -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump

    # 2. Apply pending release migrations
    stella system migrations-run --module Authority --category release

    # 3. Verify migrations applied
    stella system migrations-status --module Authority

  Verification:
    stella doctor --check check.database.migrations.pending

────────────────────────────────────────────────────────────────
Summary: 44 passed, 2 warnings, 1 failed (47 total)
Duration: 8.3s
────────────────────────────────────────────────────────────────

6. UI Surface

6.1 Route and Location

Route: /ops/doctor Location: src/Web/StellaOps.Web/src/app/features/doctor/

6.2 Component Structure

src/app/features/doctor/
├── doctor.routes.ts
├── doctor-dashboard.component.ts        # Main page
├── doctor-dashboard.component.html
├── doctor-dashboard.component.scss
├── components/
│   ├── check-list/
│   │   ├── check-list.component.ts      # Filterable check list
│   │   └── check-list.component.html
│   ├── check-result/
│   │   ├── check-result.component.ts    # Single check display
│   │   └── check-result.component.html
│   ├── remediation-panel/
│   │   ├── remediation-panel.component.ts  # Fix commands display
│   │   └── remediation-panel.component.html
│   ├── evidence-viewer/
│   │   ├── evidence-viewer.component.ts # Collected evidence
│   │   └── evidence-viewer.component.html
│   └── export-dialog/
│       ├── export-dialog.component.ts   # Export options
│       └── export-dialog.component.html
└── services/
    ├── doctor.client.ts                 # API client
    ├── doctor.service.ts                # Business logic
    └── doctor.store.ts                  # Signal-based state

6.3 Dashboard Layout

+------------------------------------------------------------------+
| Doctor Diagnostics                              [Run Quick] [Run Full] |
+------------------------------------------------------------------+
| Filters: [Category v] [Plugin v] [Severity v]     [Export Report] |
+------------------------------------------------------------------+
|                                                                    |
| Summary Strip                                                      |
| +----------+ +----------+ +----------+ +----------+ +----------+  |
| | 44       | | 2        | | 1        | | 0        | | 8.3s     |  |
| | Passed   | | Warnings | | Failed   | | Skipped  | | Duration |  |
| +----------+ +----------+ +----------+ +----------+ +----------+  |
|                                                                    |
+------------------------------------------------------------------+
| Check Results                                                      |
| +----------------------------------------------------------------+ |
| | [FAIL] check.database.migrations.pending              [Expand] | |
| |   3 pending release migrations in schema 'auth'                | |
| +----------------------------------------------------------------+ |
| | [WARN] check.tls.certificates.expiry                  [Expand] | |
| |   TLS certificate expires in 14 days                           | |
| +----------------------------------------------------------------+ |
| | [PASS] check.database.connectivity                    [Expand] | |
| |   PostgreSQL connection successful (12ms)                      | |
| +----------------------------------------------------------------+ |
| | ... more checks ...                                            | |
+------------------------------------------------------------------+

6.4 Expanded Check View

+------------------------------------------------------------------+
| [FAIL] check.database.migrations.pending                          |
+------------------------------------------------------------------+
| Diagnosis                                                         |
| 3 pending release migrations detected in schema 'auth'            |
+------------------------------------------------------------------+
| Evidence                                                          |
| +--------------------------------------------------------------+ |
| | Schema           | auth                                      | |
| | Current version  | 099_add_dpop_thumbprints                  | |
| | Pending          | 100_add_tenant_quotas                     | |
| |                  | 101_add_audit_retention                   | |
| |                  | 102_add_session_revocation                | |
| | Connection       | postgres://localhost:5432/stellaops       | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Likely Causes                                                     |
| 1. Release migrations not applied before deployment               |
| 2. Migration files added after last deployment                    |
+------------------------------------------------------------------+
| Fix Steps                                             [Copy All]  |
| +--------------------------------------------------------------+ |
| | Step 1: Backup database first (RECOMMENDED)         [Copy]   | |
| | pg_dump -h localhost -U stella_admin -d stellaops -F c \     | |
| |   -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump            | |
| +--------------------------------------------------------------+ |
| | Step 2: Apply pending release migrations            [Copy]   | |
| | stella system migrations-run --module Authority \            | |
| |   --category release                                         | |
| +--------------------------------------------------------------+ |
| | Step 3: Verify migrations applied                   [Copy]   | |
| | stella system migrations-status --module Authority           | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Verification                                          [Copy]      |
| stella doctor --check check.database.migrations.pending           |
+------------------------------------------------------------------+
| [Re-run Check]                                    [Mark Resolved] |
+------------------------------------------------------------------+

6.5 Pack Navigation and Fix Actions

  • Navigation hierarchy: packs -> plugins -> checks.
  • Each check shows status, evidence, Copy Fix Commands, and Run Fix (disabled unless doctor.fix.enabled=true).
  • Export actions: Download JSON and Download DSSE summary.

6.6 Real-Time Updates

  • Polling: Auto-refresh option (every 30s/60s/5m)
  • SSE: Live check progress during execution
  • WebSocket: Optional for high-frequency updates

7. API Surface

7.1 Endpoints

Base Path: /api/v1/doctor

Method Path Description
GET /checks List available checks with metadata
GET /plugins List available plugins
POST /run Execute doctor checks
GET /run/{runId} Get run status/results
GET /run/{runId}/stream SSE stream for live progress
GET /reports List historical reports
GET /reports/{reportId} Get specific report
DELETE /reports/{reportId} Delete report

7.2 Request/Response Models

List Checks

GET /api/v1/doctor/checks?category=database&tags=quick
{
  "checks": [
    {
      "checkId": "check.database.connectivity",
      "name": "Database Connectivity",
      "description": "Verify PostgreSQL connection",
      "pluginId": "stellaops.doctor.database",
      "category": "database",
      "defaultSeverity": "fail",
      "tags": ["quick", "database"],
      "estimatedDurationMs": 500
    }
  ],
  "total": 47
}

Run Checks

POST /api/v1/doctor/run
Content-Type: application/json

{
  "mode": "quick",
  "categories": ["database", "integration"],
  "plugins": [],
  "checkIds": [],
  "timeoutMs": 30000,
  "parallelism": 4,
  "includeRemediation": true
}
{
  "runId": "dr_20260112_143052_abc123",
  "status": "running",
  "startedAt": "2026-01-12T14:30:52Z",
  "checksTotal": 12,
  "checksCompleted": 0
}

Get Run Results

GET /api/v1/doctor/run/dr_20260112_143052_abc123
{
  "runId": "dr_20260112_143052_abc123",
  "status": "completed",
  "startedAt": "2026-01-12T14:30:52Z",
  "completedAt": "2026-01-12T14:31:00Z",
  "durationMs": 8300,
  "summary": {
    "passed": 44,
    "warnings": 2,
    "failed": 1,
    "skipped": 0,
    "total": 47
  },
  "overallSeverity": "fail",
  "results": [
    {
      "checkId": "check.database.migrations.pending",
      "pluginId": "stellaops.doctor.database",
      "category": "database",
      "severity": "fail",
      "diagnosis": "3 pending release migrations detected in schema 'auth'",
      "evidence": {
        "description": "Migration state for auth schema",
        "data": {
          "schema": "auth",
          "currentVersion": "099_add_dpop_thumbprints",
          "pendingMigrations": "100_add_tenant_quotas, 101_add_audit_retention, 102_add_session_revocation",
          "connection": "postgres://localhost:5432/stellaops"
        }
      },
      "likelyCauses": [
        "Release migrations not applied before deployment",
        "Migration files added after last deployment"
      ],
      "remediation": {
        "requiresBackup": true,
        "safetyNote": "Always backup before running migrations",
        "steps": [
          {
            "order": 1,
            "description": "Backup database first (RECOMMENDED)",
            "command": "pg_dump -h localhost -U stella_admin -d stellaops -F c -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump",
            "commandType": "shell",
            "placeholders": {}
          },
          {
            "order": 2,
            "description": "Apply pending release migrations",
            "command": "stella system migrations-run --module Authority --category release",
            "commandType": "shell",
            "placeholders": {}
          },
          {
            "order": 3,
            "description": "Verify migrations applied",
            "command": "stella system migrations-status --module Authority",
            "commandType": "shell",
            "placeholders": {}
          }
        ]
      },
      "verificationCommand": "stella doctor --check check.database.migrations.pending",
      "durationMs": 234,
      "executedAt": "2026-01-12T14:30:54Z"
    }
  ]
}

Results also expose a how_to_fix object for automation. It is a simplified alias of the richer remediation model and includes commands[] printed verbatim.

7.3 SSE Stream

GET /api/v1/doctor/run/dr_20260112_143052_abc123/stream
Accept: text/event-stream
event: check-started
data: {"checkId":"check.database.connectivity","startedAt":"2026-01-12T14:30:52Z"}

event: check-completed
data: {"checkId":"check.database.connectivity","severity":"pass","durationMs":45}

event: check-started
data: {"checkId":"check.database.migrations.pending","startedAt":"2026-01-12T14:30:52Z"}

event: check-completed
data: {"checkId":"check.database.migrations.pending","severity":"fail","durationMs":234}

event: run-completed
data: {"runId":"dr_20260112_143052_abc123","summary":{"passed":44,"warnings":2,"failed":1}}

7.4 Evidence Logs and Attestations

Doctor runs emit a JSONL evidence log and optional DSSE summary for audit trails. By default, JSONL is local only and deterministic; outbound telemetry is opt-in.

  • JSONL path: artifacts/doctor/doctor-run-<runId>.ndjson (configurable).
  • DSSE summary: artifacts/doctor/doctor-run-<runId>.dsse.json (optional).
  • Evidence records include doctor_command to capture the operator-invoked command. DSSE summaries assume operator execution and must include the same command note.

Example JSONL line:

{"runId":"dr_20260112_143052_abc123","doctor_command":"stella doctor run --format json","checkId":"check.database.connectivity","severity":"pass","executedAt":"2026-01-12T14:30:52Z","how_to_fix":{"commands":[]}}

8. Remediation Command Patterns

Remediation should favor the best operator experience: short, copy/paste friendly commands with minimal steps and clear verification guidance.

8.1 Standard Output Format

Every failed check produces remediation in this structure:

[{SEVERITY}] {check.id}
  Diagnosis: {one-line summary}

  Evidence:
    {key}: {value}
    {key}: {value}
    ...

  Likely Causes:
    1. {most likely cause}
    2. {second most likely cause}
    ...

  Fix Steps:
    # {step number}. {description}
    {command}

    # {step number}. {description}
    {command}
    ...

  Verification:
    {command to re-run this specific check}

8.1.1 JSON Remediation Structure

The JSON output MUST include a how_to_fix object for agent consumption. It can be derived from remediation.steps and preserves command order.

"how_to_fix": {
  "summary": "Apply baseline branch policy",
  "commands": [
    "stella orchestrator scm apply-branch-policy --preset strict"
  ]
}

8.2 Placeholder Conventions

When commands require user-specific values:

Placeholder Meaning Example
{HOSTNAME} Target hostname ldap.example.com
{PORT} Port number 636
{USERNAME} Username admin
{PASSWORD} Password (never shown) ***
{DATABASE} Database name stellaops
{SCHEMA} Schema name auth
{FILE_PATH} File path /etc/ssl/certs/ca.crt
{TOKEN} API token (never shown) ***
{URL} Full URL https://api.github.com

8.3 Safety Notes

Doctor fix executes only non-destructive commands. If a fix requires a change that modifies data, Doctor must present it as manual guidance with explicit safety notes and never execute it.

  Manual Steps (not executed by Doctor):
    # SAFETY: This operation modifies the database. Create a backup first.

    # 1. Backup database (REQUIRED before proceeding)
    pg_dump -h {HOSTNAME} -U {USERNAME} -d {DATABASE} -F c \
      -f backup_$(date +%Y%m%d_%H%M%S).dump

    # 2. Apply the fix
    stella system migrations-run --module Authority --category release

8.4 Multi-Platform Commands

Where applicable, provide commands for different platforms:

  Fix Steps:
    # 1. Restart the service

    # Linux (systemd):
    sudo systemctl restart stellaops-gateway

    # Linux (Docker):
    docker restart stellaops-gateway

    # Docker Compose:
    docker compose restart gateway

    # Kubernetes:
    kubectl rollout restart deployment/stellaops-gateway -n stellaops

9. Doctor Check Catalog

This section documents all diagnostic checks organized by plugin/category.

9.1 Core Platform Plugin (stellaops.doctor.core)

check.config.required

Property Value
CheckId check.config.required
Plugin stellaops.doctor.core
Category Core
Severity Fail
Tags quick, config, startup
What it verifies All required configuration values are present
Evidence collected Missing keys, config sources checked, environment
Failure modes Missing STELLAOPS_BACKEND_URL, missing database connection string, missing Authority URL

Remediation:

# 1. Check which configuration values are missing
stella config list --show-missing

# 2. Set missing environment variables
export STELLAOPS_BACKEND_URL="https://api.stellaops.example.com"
export STELLAOPS_POSTGRES_CONNECTION="Host=localhost;Database=stellaops;Username=stella_app;Password={PASSWORD}"
export STELLAOPS_AUTHORITY_URL="https://auth.stellaops.example.com"

# 3. Or update configuration file
# Edit: /etc/stellaops/stellaops.yaml

Verification: stella doctor --check check.config.required


check.config.syntax

Property Value
CheckId check.config.syntax
Plugin stellaops.doctor.core
Category Core
Severity Fail
Tags quick, config
What it verifies Configuration files have valid YAML/JSON syntax
Evidence collected File path, line number, parse error message
Failure modes Invalid YAML indentation, JSON syntax error, encoding issues

Remediation:

# 1. Validate YAML syntax
yamllint /etc/stellaops/stellaops.yaml

# 2. Check for encoding issues (should be UTF-8)
file /etc/stellaops/stellaops.yaml

# 3. Fix common YAML issues
# - Use spaces, not tabs
# - Check string quoting
# - Verify indentation (2 spaces per level)

Verification: stella doctor --check check.config.syntax


check.config.deprecated

Property Value
CheckId check.config.deprecated
Plugin stellaops.doctor.core
Category Core
Severity Warn
Tags config
What it verifies No deprecated configuration keys are in use
Evidence collected Deprecated keys found, replacement keys
Failure modes Using old key names, removed options

Remediation:

# 1. Review deprecated keys and their replacements
stella config migrate --dry-run

# 2. Update configuration file with new key names
stella config migrate --apply

# 3. Verify configuration after migration
stella config validate

Verification: stella doctor --check check.config.deprecated


check.runtime.dotnet

Property Value
CheckId check.runtime.dotnet
Plugin stellaops.doctor.core
Category Core
Severity Fail
Tags quick, runtime
What it verifies .NET runtime version meets minimum requirements
Evidence collected Installed version, required version, runtime path
Failure modes Outdated .NET version, missing runtime

Remediation:

# 1. Check current .NET version
dotnet --version

# 2. Install required .NET version (Ubuntu/Debian)
wget https://dot.net/v1/dotnet-install.sh
chmod +x dotnet-install.sh
./dotnet-install.sh --channel 10.0

# 3. Verify installation
dotnet --list-runtimes

Verification: stella doctor --check check.runtime.dotnet


check.runtime.memory

Property Value
CheckId check.runtime.memory
Plugin stellaops.doctor.core
Category Core
Severity Warn
Tags runtime, resources
What it verifies Sufficient memory available for operation
Evidence collected Total memory, available memory, GC memory info
Failure modes Low available memory (<1GB), high GC pressure

Remediation:

# 1. Check current memory usage
free -h

# 2. Identify memory-heavy processes
ps aux --sort=-%mem | head -20

# 3. Adjust container memory limits if applicable
# Docker:
docker update --memory 4g stellaops-gateway

# Kubernetes:
kubectl patch deployment stellaops-gateway -p '{"spec":{"template":{"spec":{"containers":[{"name":"gateway","resources":{"limits":{"memory":"4Gi"}}}]}}}}'

Verification: stella doctor --check check.runtime.memory


check.runtime.disk.space

Property Value
CheckId check.runtime.disk.space
Plugin stellaops.doctor.core
Category Core
Severity Warn
Tags runtime, resources
What it verifies Sufficient disk space on required paths
Evidence collected Path, total space, available space, usage percentage
Failure modes Data directory >90% full, log directory full

Remediation:

# 1. Check disk usage
df -h /var/lib/stellaops

# 2. Find large files
du -sh /var/lib/stellaops/* | sort -hr | head -20

# 3. Clean up old logs
find /var/log/stellaops -name "*.log" -mtime +30 -delete

# 4. Clean up old exports
stella export cleanup --older-than 30d

Verification: stella doctor --check check.runtime.disk.space


check.runtime.disk.permissions

Property Value
CheckId check.runtime.disk.permissions
Plugin stellaops.doctor.core
Category Core
Severity Fail
Tags quick, runtime, security
What it verifies Write permissions on required directories
Evidence collected Path, expected permissions, actual permissions, owner
Failure modes Cannot write to data directory, log directory not writable

Remediation:

# 1. Check current permissions
ls -la /var/lib/stellaops

# 2. Fix ownership
sudo chown -R stellaops:stellaops /var/lib/stellaops

# 3. Fix permissions
sudo chmod 755 /var/lib/stellaops
sudo chmod 755 /var/log/stellaops

# 4. Verify write access
sudo -u stellaops touch /var/lib/stellaops/.write-test && rm /var/lib/stellaops/.write-test

Verification: stella doctor --check check.runtime.disk.permissions


check.time.sync

Property Value
CheckId check.time.sync
Plugin stellaops.doctor.core
Category Core
Severity Warn
Tags quick, runtime
What it verifies System clock is synchronized (NTP)
Evidence collected NTP status, clock offset, sync source
Failure modes Clock drift >5s, NTP not running, no sync source

Remediation:

# 1. Check NTP status
timedatectl status

# 2. Enable NTP synchronization
sudo timedatectl set-ntp true

# 3. Force immediate sync
sudo systemctl restart systemd-timesyncd

# 4. Verify sync status
timedatectl timesync-status

Verification: stella doctor --check check.time.sync


check.crypto.profiles

Property Value
CheckId check.crypto.profiles
Plugin stellaops.doctor.core
Category Core
Severity Fail
Tags quick, security, crypto
What it verifies Crypto profile is valid and providers are available
Evidence collected Active profile, available providers, missing providers
Failure modes Invalid profile, required provider not available

Remediation:

# 1. List available crypto profiles
stella crypto profiles list

# 2. Validate current profile
stella crypto profiles validate

# 3. Switch to a different profile if needed
stella crypto profiles set --profile default

# 4. Install missing providers (if GOST required)
# See docs/crypto/gost-setup.md

Verification: stella doctor --check check.crypto.profiles


9.2 Database Plugin (stellaops.doctor.database)

check.database.connectivity

Property Value
CheckId check.database.connectivity
Plugin stellaops.doctor.database
Category Database
Severity Fail
Tags quick, database
What it verifies PostgreSQL connection is successful
Evidence collected Connection string (redacted), latency, server version
Failure modes Connection refused, authentication failed, timeout

Remediation:

# 1. Test connection manually
psql "host=localhost dbname=stellaops user=stella_app" -c "SELECT 1"

# 2. Check PostgreSQL is running
sudo systemctl status postgresql

# 3. Check connection settings
# Verify pg_hba.conf allows connections
sudo cat /etc/postgresql/16/main/pg_hba.conf | grep stellaops

# 4. Check firewall
sudo ufw status | grep 5432

Verification: stella doctor --check check.database.connectivity


check.database.version

Property Value
CheckId check.database.version
Plugin stellaops.doctor.database
Category Database
Severity Warn
Tags database
What it verifies PostgreSQL version meets minimum requirements (>=16)
Evidence collected Current version, required version
Failure modes PostgreSQL <16, unsupported version

Remediation:

# 1. Check current version
psql -c "SELECT version();"

# 2. Upgrade PostgreSQL (Ubuntu)
sudo apt install postgresql-16

# 3. Migrate data to new version
sudo pg_upgradecluster 14 main

# 4. Remove old version
sudo apt remove postgresql-14

Verification: stella doctor --check check.database.version


check.database.migrations.pending

Property Value
CheckId check.database.migrations.pending
Plugin stellaops.doctor.database
Category Database
Severity Fail
Tags database, migrations
What it verifies No pending release migrations exist
Evidence collected Schema, current version, pending migrations list
Failure modes Release migrations not applied before deployment

Remediation:

# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
  -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump

# 2. Check migration status for all modules
stella system migrations-status

# 3. Apply pending release migrations
stella system migrations-run --category release

# 4. Verify all migrations applied
stella system migrations-status --verify

Verification: stella doctor --check check.database.migrations.pending


check.database.migrations.checksum

Property Value
CheckId check.database.migrations.checksum
Plugin stellaops.doctor.database
Category Database
Severity Fail
Tags database, migrations, security
What it verifies Applied migration checksums match source files
Evidence collected Mismatched migrations, expected vs actual checksum
Failure modes Migration file modified after application, corruption

Remediation:

# CRITICAL: Checksum mismatch indicates potential data integrity issue

# 1. Identify mismatched migrations
stella system migrations-verify --detailed

# 2. If migrations were legitimately modified (rare):
# WARNING: Only proceed if you understand the implications
stella system migrations-repair --migration {MIGRATION_NAME} --force

# 3. If data corruption suspected:
# Restore from backup and reapply migrations
pg_restore -h localhost -U stella_admin -d stellaops stellaops_backup.dump
stella system migrations-run --all

Verification: stella doctor --check check.database.migrations.checksum


check.database.migrations.lock

Property Value
CheckId check.database.migrations.lock
Plugin stellaops.doctor.database
Category Database
Severity Warn
Tags database, migrations
What it verifies No stale migration locks exist
Evidence collected Lock holder, lock duration, schema
Failure modes Abandoned lock from crashed process

Remediation:

# 1. Check for active locks
psql -d stellaops -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';"

# 2. Identify lock holder process
psql -d stellaops -c "SELECT pid, query, state FROM pg_stat_activity WHERE pid IN (SELECT pid FROM pg_locks WHERE locktype = 'advisory');"

# 3. If process is dead, clear the lock
# WARNING: Only if you are certain no migration is running
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"

# 4. Retry migration
stella system migrations-run --category release

Verification: stella doctor --check check.database.migrations.lock


check.database.schema.{schema}

Property Value
CheckId check.database.schema.{schema} (e.g., check.database.schema.auth)
Plugin stellaops.doctor.database
Category Database
Severity Fail
Tags database
What it verifies Schema exists and has expected tables
Evidence collected Schema name, expected tables, missing tables
Failure modes Schema not created, tables dropped

Remediation:

# 1. Check if schema exists
psql -d stellaops -c "SELECT schema_name FROM information_schema.schemata WHERE schema_name = '{SCHEMA}';"

# 2. If schema missing, run startup migrations
stella system migrations-run --module {MODULE} --category startup

# 3. Verify schema tables
psql -d stellaops -c "SELECT table_name FROM information_schema.tables WHERE table_schema = '{SCHEMA}';"

Verification: stella doctor --check check.database.schema.{schema}


check.database.connections.pool

Property Value
CheckId check.database.connections.pool
Plugin stellaops.doctor.database
Category Database
Severity Warn
Tags database, performance
What it verifies Connection pool is healthy, not exhausted
Evidence collected Active connections, idle connections, max connections
Failure modes Pool exhausted, connection leak

Remediation:

# 1. Check current connections
psql -d stellaops -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'stellaops';"

# 2. Check max connections
psql -d stellaops -c "SHOW max_connections;"

# 3. Identify long-running queries
psql -d stellaops -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10;"

# 4. Increase max connections if needed
# Edit postgresql.conf: max_connections = 200
sudo systemctl reload postgresql

Verification: stella doctor --check check.database.connections.pool


9.3 Service Graph Plugin (stellaops.doctor.servicegraph)

check.services.gateway.running

Property Value
CheckId check.services.gateway.running
Plugin stellaops.doctor.servicegraph
Category ServiceGraph
Severity Fail
Tags quick, services
What it verifies Gateway service is running and accepting connections
Evidence collected Service status, PID, uptime, port binding
Failure modes Service not running, port already in use

Remediation:

# 1. Check service status
sudo systemctl status stellaops-gateway

# 2. Check logs for errors
sudo journalctl -u stellaops-gateway -n 50

# 3. Check port binding
sudo ss -tlnp | grep 443

# 4. Start/restart service
sudo systemctl restart stellaops-gateway

Verification: stella doctor --check check.services.gateway.running


check.services.gateway.routing

Property Value
CheckId check.services.gateway.routing
Plugin stellaops.doctor.servicegraph
Category ServiceGraph
Severity Fail
Tags services, routing
What it verifies Gateway can route requests to backend services
Evidence collected Registered services, routing table, disconnected services
Failure modes No services registered, all services disconnected

Remediation:

# 1. Check registered services
curl -s http://localhost:8080/health/routing | jq

# 2. Verify backend services are running
stella services status

# 3. Check Router transport connectivity
stella services connectivity-test

# 4. Restart disconnected services
sudo systemctl restart stellaops-concelier
sudo systemctl restart stellaops-scanner

Verification: stella doctor --check check.services.gateway.routing


check.services.{service}.health

Property Value
CheckId check.services.{service}.health (e.g., check.services.concelier.health)
Plugin stellaops.doctor.servicegraph
Category ServiceGraph
Severity Fail
Tags services
What it verifies Service health endpoint returns healthy
Evidence collected Health status, dependencies, latency
Failure modes Service unhealthy, degraded dependencies

Remediation:

# 1. Check service health directly
curl -s http://localhost:{PORT}/healthz | jq

# 2. Check detailed health
curl -s http://localhost:{PORT}/health/details | jq

# 3. Check service logs
sudo journalctl -u stellaops-{SERVICE} -n 100

# 4. Restart service if needed
sudo systemctl restart stellaops-{SERVICE}

Verification: stella doctor --check check.services.{service}.health


check.services.{service}.connectivity

Property Value
CheckId check.services.{service}.connectivity
Plugin stellaops.doctor.servicegraph
Category ServiceGraph
Severity Fail
Tags services, routing
What it verifies Service is reachable from Gateway via Router
Evidence collected Transport type, connection state, last heartbeat
Failure modes Connection refused, heartbeat timeout

Remediation:

# 1. Check Router connection status
stella services connection-status --service {SERVICE}

# 2. Test network connectivity
nc -zv {SERVICE_HOST} {SERVICE_PORT}

# 3. Check firewall rules
sudo ufw status | grep {SERVICE_PORT}

# 4. Verify Router configuration in service
# Check stellaops.yaml for correct Router endpoints

Verification: stella doctor --check check.services.{service}.connectivity


check.services.authority.connectivity

Property Value
CheckId check.services.authority.connectivity
Plugin stellaops.doctor.servicegraph
Category ServiceGraph
Severity Fail
Tags quick, services, auth
What it verifies Authority service is reachable
Evidence collected Authority URL, response status, latency
Failure modes Authority unreachable, OIDC discovery failed

Remediation:

# 1. Check Authority URL configuration
echo $STELLAOPS_AUTHORITY_URL

# 2. Test OIDC discovery endpoint
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq

# 3. Check Authority service status
sudo systemctl status stellaops-authority

# 4. Verify network connectivity
curl -v ${STELLAOPS_AUTHORITY_URL}/healthz

Verification: stella doctor --check check.services.authority.connectivity


9.4 Security Plugin (stellaops.doctor.security)

check.auth.oidc.discovery

Property Value
CheckId check.auth.oidc.discovery
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags quick, auth, security
What it verifies OIDC well-known endpoint is accessible
Evidence collected Discovery URL, issuer, supported flows
Failure modes Discovery endpoint unavailable, invalid response

Remediation:

# 1. Test discovery endpoint
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq

# 2. Verify issuer matches configuration
# The issuer in the response should match STELLAOPS_AUTHORITY_URL

# 3. Check Authority service logs
sudo journalctl -u stellaops-authority -n 50

# 4. Verify TLS certificate
openssl s_client -connect auth.stellaops.example.com:443 -servername auth.stellaops.example.com

Verification: stella doctor --check check.auth.oidc.discovery


check.auth.oidc.jwks

Property Value
CheckId check.auth.oidc.jwks
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags auth, security
What it verifies JWKS endpoint returns valid signing keys
Evidence collected JWKS URL, key count, key algorithms
Failure modes JWKS unavailable, no keys, unsupported algorithms

Remediation:

# 1. Fetch JWKS directly
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/jwks.json | jq

# 2. Verify keys are present
# Response should contain at least one key in "keys" array

# 3. If JWKS is empty, regenerate signing keys
stella authority keys rotate

# 4. Restart Authority service
sudo systemctl restart stellaops-authority

Verification: stella doctor --check check.auth.oidc.jwks


check.auth.ldap.bind

Property Value
CheckId check.auth.ldap.bind
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags auth, security, ldap
What it verifies LDAP bind credentials are valid
Evidence collected LDAP host, bind DN (redacted), TLS status
Failure modes Invalid credentials, connection refused, TLS failure

Remediation:

# 1. Test LDAP connection with ldapsearch
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
  -D "cn=bind-user,ou=service,dc=example,dc=internal" \
  -w "{PASSWORD}" \
  -b "ou=people,dc=example,dc=internal" "(uid=*)" dn | head -10

# 2. Check TLS certificate
openssl s_client -connect {LDAP_HOST}:636 -showcerts

# 3. Verify bind DN and password in configuration
# Check etc/authority.plugins/ldap.yaml

# 4. Test with Authority's ldap-test command
stella authority ldap-test --bind-only

Verification: stella doctor --check check.auth.ldap.bind


check.auth.ldap.search

Property Value
CheckId check.auth.ldap.search
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags auth, ldap
What it verifies LDAP search base is accessible and returns users
Evidence collected Search base, user count, search time
Failure modes Search base not found, no users returned, timeout

Remediation:

# 1. Test LDAP search
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
  -D "{BIND_DN}" -w "{PASSWORD}" \
  -b "{SEARCH_BASE}" "(objectClass=person)" dn | wc -l

# 2. Verify search base in configuration
# Check etc/authority.plugins/ldap.yaml: connection.searchBase

# 3. Check if search base exists
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
  -D "{BIND_DN}" -w "{PASSWORD}" \
  -b "" -s base "(objectClass=*)"

# 4. Verify bind user has read permissions
# Check LDAP ACLs

Verification: stella doctor --check check.auth.ldap.search


check.auth.ldap.groups

Property Value
CheckId check.auth.ldap.groups
Plugin stellaops.doctor.security
Category Security
Severity Warn
Tags auth, ldap
What it verifies LDAP group mapping is configured and working
Evidence collected Group attribute, mapped groups, sample user groups
Failure modes Group attribute not found, no groups mapped

Remediation:

# 1. Check group attribute configuration
# etc/authority.plugins/ldap.yaml: claims.groupAttribute

# 2. Test group lookup for a sample user
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
  -D "{BIND_DN}" -w "{PASSWORD}" \
  -b "{SEARCH_BASE}" "(uid={TEST_USER})" memberOf

# 3. Verify group mapping in Authority
stella authority ldap-test --user {TEST_USER} --show-groups

# 4. Update group attribute if needed
# Common attributes: memberOf, member, groupMembership

Verification: stella doctor --check check.auth.ldap.groups


check.tls.certificates.expiry

Property Value
CheckId check.tls.certificates.expiry
Plugin stellaops.doctor.security
Category Security
Severity Warn (30d), Fail (7d)
Tags quick, security, tls
What it verifies TLS certificates are not expiring soon
Evidence collected Certificate path, subject, expiry date, days remaining
Failure modes Certificate expired, expiring within threshold

Remediation:

# 1. Check certificate expiry
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -enddate

# 2. Renew with certbot (if using Let's Encrypt)
sudo certbot renew --cert-name stellaops.example.com

# 3. Renew manually (if self-signed or enterprise CA)
# Generate new CSR
openssl req -new -key /etc/ssl/private/stellaops.key \
  -out /tmp/stellaops.csr -subj "/CN=stellaops.example.com"

# Submit CSR to CA and install new certificate

# 4. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway

Verification: stella doctor --check check.tls.certificates.expiry


check.tls.certificates.chain

Property Value
CheckId check.tls.certificates.chain
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags security, tls
What it verifies TLS certificate chain is complete and valid
Evidence collected Certificate chain, validation errors
Failure modes Missing intermediate, self-signed not trusted, chain broken

Remediation:

# 1. Verify certificate chain
openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt \
  /etc/ssl/certs/stellaops.crt

# 2. Check chain with openssl
openssl s_client -connect stellaops.example.com:443 \
  -servername stellaops.example.com -showcerts

# 3. Download missing intermediate certificates
# From your CA's website

# 4. Concatenate certificates in correct order
cat stellaops.crt intermediate.crt > stellaops-fullchain.crt

Verification: stella doctor --check check.tls.certificates.chain


check.secrets.vault.connectivity

Property Value
CheckId check.secrets.vault.connectivity
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags security, vault
What it verifies Vault service is reachable
Evidence collected Vault address, seal status, version
Failure modes Vault unreachable, sealed, version mismatch

Remediation:

# 1. Check Vault status
vault status

# 2. If sealed, unseal Vault
vault operator unseal {UNSEAL_KEY_1}
vault operator unseal {UNSEAL_KEY_2}
vault operator unseal {UNSEAL_KEY_3}

# 3. Check network connectivity
curl -s ${VAULT_ADDR}/v1/sys/health | jq

# 4. Verify VAULT_ADDR environment variable
echo $VAULT_ADDR

Verification: stella doctor --check check.secrets.vault.connectivity


check.secrets.vault.auth

Property Value
CheckId check.secrets.vault.auth
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags security, vault
What it verifies Vault authentication is successful
Evidence collected Auth method, token TTL, policies
Failure modes Invalid token, expired token, wrong auth method

Remediation:

# 1. Check current token
vault token lookup

# 2. If token expired, authenticate again
# Token auth:
vault login {TOKEN}

# AppRole auth:
vault write auth/approle/login role_id={ROLE_ID} secret_id={SECRET_ID}

# Kubernetes auth:
vault write auth/kubernetes/login role=stellaops jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token

# 3. Verify authentication worked
vault token lookup

Verification: stella doctor --check check.secrets.vault.auth


check.secrets.vault.paths

Property Value
CheckId check.secrets.vault.paths
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags security, vault
What it verifies Required secret paths are accessible
Evidence collected Checked paths, accessible paths, denied paths
Failure modes Permission denied, path not found

Remediation:

# 1. Test reading required secrets
vault kv get secret/data/stellaops/api-keys

# 2. Check policy permissions
vault token lookup -format=json | jq '.data.policies'

# 3. Review policy rules
vault policy read stellaops

# 4. Update policy if needed
vault policy write stellaops - <<EOF
path "secret/data/stellaops/*" {
  capabilities = ["read", "list"]
}
EOF

Verification: stella doctor --check check.secrets.vault.paths


check.security.evidence.integrity

Property Value
CheckId check.security.evidence.integrity
Plugin stellaops.doctor.security
Category Security
Severity Fail
Tags security, evidence, integrity, dsse, rekor, offline
What it verifies Evidence files have valid DSSE signatures, Rekor inclusion proofs, and consistent hashes
Evidence collected Evidence locker path, total files, valid/invalid/skipped counts, specific issues
Failure modes Empty DSSE payload, missing signatures, invalid base64, missing Rekor UUID, missing inclusion proof hashes, digest mismatch

What it checks:

  1. DSSE Envelope Structure: Validates payloadType, payload (base64), and signatures array
  2. Signature Completeness: Each signature has keyid and valid base64 sig
  3. Payload Digest Consistency: If payloadDigest field present, recomputes and compares SHA-256
  4. Evidence Bundle Structure: Validates bundleId, manifest.version, and optional contentDigest
  5. Rekor Receipt Validity: If present, validates uuid, logIndex, and inclusionProof.hashes

Remediation:

# 1. List evidence files with issues
stella doctor --check check.security.evidence.integrity --output json \
  | jq '.evidence.issues[]'

# 2. Re-sign affected evidence bundles
stella evidence resign --bundle-id {BUNDLE_ID}

# 3. Verify Rekor inclusion manually (if online)
rekor-cli get --uuid {REKOR_UUID} --format json | jq

# 4. For offline environments, verify against local ledger
stella evidence verify --offline --bundle-id {BUNDLE_ID}

# 5. Re-generate evidence pack from source
stella export evidence-pack --artifact {ARTIFACT_DIGEST} --force

Configuration:

# etc/appsettings.yaml
EvidenceLocker:
  LocalPath: /var/lib/stellaops/evidence
  # Or use Evidence:BasePath for alternate key

Verification: stella doctor --check check.security.evidence.integrity


9.5 Integration Plugins - SCM (stellaops.doctor.integration.scm.*)

check.integration.scm.github.connectivity

Property Value
CheckId check.integration.scm.github.connectivity
Plugin stellaops.doctor.integration.scm.github
Category Integration
Severity Fail
Tags integration, scm, github
What it verifies GitHub API is reachable
Evidence collected API endpoint, response status, latency
Failure modes API unreachable, DNS resolution failed, TLS error

Remediation:

# 1. Test GitHub API connectivity
curl -s https://api.github.com/zen

# 2. Check DNS resolution
nslookup api.github.com

# 3. Test with authentication
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user

# 4. Check proxy settings if behind firewall
echo $HTTPS_PROXY

Verification: stella doctor --check check.integration.scm.github.connectivity


check.integration.scm.github.auth

Property Value
CheckId check.integration.scm.github.auth
Plugin stellaops.doctor.integration.scm.github
Category Integration
Severity Fail
Tags integration, scm, github, auth
What it verifies GitHub authentication is successful
Evidence collected Auth type (PAT/App/OAuth), user/app info
Failure modes Invalid token, expired token, wrong app credentials

Remediation:

# For Personal Access Token:
# 1. Verify token is valid
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | jq '.login'

# 2. Generate new token if expired
# Visit: https://github.com/settings/tokens

# For GitHub App:
# 1. Check app installation
curl -s -H "Authorization: Bearer {JWT}" \
  -H "Accept: application/vnd.github+json" \
  https://api.github.com/app

# 2. Verify app is installed on repository
curl -s -H "Authorization: Bearer {INSTALLATION_TOKEN}" \
  https://api.github.com/installation/repositories

Verification: stella doctor --check check.integration.scm.github.auth


check.integration.scm.github.permissions

Property Value
CheckId check.integration.scm.github.permissions
Plugin stellaops.doctor.integration.scm.github
Category Integration
Severity Fail
Tags integration, scm, github
What it verifies Token/App has required scopes/permissions
Evidence collected Current scopes, required scopes, missing scopes
Failure modes Missing repo scope, missing write:packages

Remediation:

# 1. Check current token scopes
curl -sI -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | grep x-oauth-scopes

# Required scopes for Stella Ops:
# - repo (full repository access)
# - read:org (organization membership)
# - write:packages (container registry)

# 2. Generate new token with correct scopes
# Visit: https://github.com/settings/tokens/new
# Select: repo, read:org, write:packages

# 3. Update token in Stella Ops
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}

Verification: stella doctor --check check.integration.scm.github.permissions


check.integration.scm.github.ratelimit

Property Value
CheckId check.integration.scm.github.ratelimit
Plugin stellaops.doctor.integration.scm.github
Category Integration
Severity Warn
Tags integration, scm, github
What it verifies GitHub API rate limit is not exhausted
Evidence collected Limit, remaining, reset time
Failure modes Rate limit exhausted, near threshold

Remediation:

# 1. Check current rate limit status
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit | jq

# 2. If exhausted, wait for reset
# The "reset" field shows Unix timestamp when limit resets

# 3. Consider using GitHub App instead of PAT for higher limits
# PAT: 5000 requests/hour
# GitHub App: 15000 requests/hour per installation

# 4. Implement request caching in your application

Verification: stella doctor --check check.integration.scm.github.ratelimit


check.integration.scm.gitlab.connectivity

Property Value
CheckId check.integration.scm.gitlab.connectivity
Plugin stellaops.doctor.integration.scm.gitlab
Category Integration
Severity Fail
Tags integration, scm, gitlab
What it verifies GitLab API is reachable
Evidence collected API endpoint, response status, version
Failure modes API unreachable, self-hosted instance down

Remediation:

# 1. Test GitLab API connectivity
curl -s https://{GITLAB_HOST}/api/v4/version

# 2. For self-hosted GitLab, check service status
sudo gitlab-ctl status

# 3. Check firewall/proxy
curl -v https://{GITLAB_HOST}/api/v4/version

# 4. Verify URL configuration
stella integrations show --id {INTEGRATION_ID}

Verification: stella doctor --check check.integration.scm.gitlab.connectivity


check.integration.scm.gitlab.auth

Property Value
CheckId check.integration.scm.gitlab.auth
Plugin stellaops.doctor.integration.scm.gitlab
Category Integration
Severity Fail
Tags integration, scm, gitlab, auth
What it verifies GitLab authentication is successful
Evidence collected Auth type, user info, token expiry
Failure modes Invalid token, expired token, revoked access

Remediation:

# 1. Test token authentication
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/user | jq '.username'

# 2. Check token expiry
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/personal_access_tokens/self | jq '.expires_at'

# 3. Generate new token if expired
# Visit: https://{GITLAB_HOST}/-/profile/personal_access_tokens

# 4. Update token in Stella Ops
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}

Verification: stella doctor --check check.integration.scm.gitlab.auth


9.6 Integration Plugins - Registry (stellaops.doctor.integration.registry.*)

check.integration.registry.harbor.connectivity

Property Value
CheckId check.integration.registry.harbor.connectivity
Plugin stellaops.doctor.integration.registry.harbor
Category Integration
Severity Fail
Tags integration, registry, harbor
What it verifies Harbor registry is reachable
Evidence collected Registry URL, health status, version
Failure modes Registry unreachable, components unhealthy

Remediation:

# 1. Check Harbor health endpoint
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq

# 2. Check individual components
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq '.components'

# 3. For self-hosted Harbor, check services
docker compose -f /opt/harbor/docker-compose.yml ps

# 4. Check Harbor logs
docker compose -f /opt/harbor/docker-compose.yml logs --tail=50 core

Verification: stella doctor --check check.integration.registry.harbor.connectivity


check.integration.registry.harbor.auth

Property Value
CheckId check.integration.registry.harbor.auth
Plugin stellaops.doctor.integration.registry.harbor
Category Integration
Severity Fail
Tags integration, registry, harbor, auth
What it verifies Harbor authentication is successful
Evidence collected Auth type, user info, project access
Failure modes Invalid credentials, LDAP sync issue

Remediation:

# 1. Test Docker login
docker login {HARBOR_HOST} -u {USERNAME} -p {PASSWORD}

# 2. Test API authentication
curl -s -u {USERNAME}:{PASSWORD} https://{HARBOR_HOST}/api/v2.0/users/current | jq

# 3. Check if user exists
curl -s -u admin:{ADMIN_PASSWORD} https://{HARBOR_HOST}/api/v2.0/users?username={USERNAME} | jq

# 4. Reset password if needed
# Via Harbor UI: https://{HARBOR_HOST}/harbor/users

Verification: stella doctor --check check.integration.registry.harbor.auth


check.integration.registry.harbor.pull

Property Value
CheckId check.integration.registry.harbor.pull
Plugin stellaops.doctor.integration.registry.harbor
Category Integration
Severity Fail
Tags integration, registry, harbor
What it verifies Can pull images from configured repositories
Evidence collected Test image, pull result, error message
Failure modes Permission denied, repository not found

Remediation:

# 1. Test image pull
docker pull {HARBOR_HOST}/{PROJECT}/{IMAGE}:{TAG}

# 2. Check project membership
curl -s -u {USERNAME}:{PASSWORD} \
  https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members | jq

# 3. Add user to project if needed
curl -X POST -u admin:{ADMIN_PASSWORD} \
  -H "Content-Type: application/json" \
  -d '{"role_id": 2, "member_user": {"username": "{USERNAME}"}}' \
  https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members

# 4. Verify repository exists
curl -s -u {USERNAME}:{PASSWORD} \
  https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/repositories | jq

Verification: stella doctor --check check.integration.registry.harbor.pull


check.integration.registry.ecr.connectivity

Property Value
CheckId check.integration.registry.ecr.connectivity
Plugin stellaops.doctor.integration.registry.ecr
Category Integration
Severity Fail
Tags integration, registry, ecr, aws
What it verifies AWS ECR is reachable
Evidence collected Registry URL, AWS region, endpoint status
Failure modes AWS credentials invalid, region mismatch

Remediation:

# 1. Verify AWS credentials
aws sts get-caller-identity

# 2. Test ECR describe repositories
aws ecr describe-repositories --region {REGION}

# 3. Get ECR login token
aws ecr get-login-password --region {REGION} | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com

# 4. Check AWS credentials configuration
cat ~/.aws/credentials

Verification: stella doctor --check check.integration.registry.ecr.connectivity


check.integration.registry.ecr.pull

Property Value
CheckId check.integration.registry.ecr.pull
Plugin stellaops.doctor.integration.registry.ecr
Category Integration
Severity Fail
Tags integration, registry, ecr, aws
What it verifies Can pull images from ECR repositories
Evidence collected Repository, IAM permissions, error
Failure modes ecr:GetAuthorizationToken denied, ecr:BatchGetImage denied

Remediation:

# 1. Check IAM permissions
aws iam simulate-principal-policy \
  --policy-source-arn {ROLE_ARN} \
  --action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer

# 2. Add required IAM policy
aws iam put-role-policy --role-name {ROLE_NAME} --policy-name ECRPullAccess --policy-document '{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage"
    ],
    "Resource": "*"
  }]
}'

# 3. Test pull
docker pull {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{REPO}:{TAG}

Verification: stella doctor --check check.integration.registry.ecr.pull


9.7 Observability Plugin (stellaops.doctor.observability)

check.telemetry.otlp.endpoint

Property Value
CheckId check.telemetry.otlp.endpoint
Plugin stellaops.doctor.observability
Category Observability
Severity Warn
Tags observability, telemetry
What it verifies OTLP collector endpoint is reachable
Evidence collected Endpoint URL, response status, protocol
Failure modes Collector unreachable, wrong protocol (gRPC vs HTTP)

Remediation:

# 1. Check OTLP endpoint configuration
echo $OTEL_EXPORTER_OTLP_ENDPOINT

# 2. Test HTTP endpoint
curl -v ${OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces

# 3. Test gRPC endpoint
grpcurl -plaintext {COLLECTOR_HOST}:4317 list

# 4. Check collector is running
# If using OpenTelemetry Collector:
docker logs otel-collector

# 5. Verify collector configuration
cat /etc/otel-collector/config.yaml

Verification: stella doctor --check check.telemetry.otlp.endpoint


check.logs.directory.writable

Property Value
CheckId check.logs.directory.writable
Plugin stellaops.doctor.observability
Category Observability
Severity Fail
Tags quick, observability, logs
What it verifies Log directory is writable
Evidence collected Log path, permissions, owner
Failure modes Directory not writable, disk full

Remediation:

# 1. Check log directory permissions
ls -la /var/log/stellaops

# 2. Fix ownership
sudo chown -R stellaops:stellaops /var/log/stellaops

# 3. Fix permissions
sudo chmod 755 /var/log/stellaops

# 4. Check disk space
df -h /var/log/stellaops

Verification: stella doctor --check check.logs.directory.writable


check.logs.rotation.configured

Property Value
CheckId check.logs.rotation.configured
Plugin stellaops.doctor.observability
Category Observability
Severity Warn
Tags observability, logs
What it verifies Log rotation is configured
Evidence collected Rotation config path, settings
Failure modes No rotation configured, invalid config

Remediation:

# 1. Check if logrotate config exists
ls -la /etc/logrotate.d/stellaops

# 2. Create logrotate configuration
sudo cat > /etc/logrotate.d/stellaops << 'EOF'
/var/log/stellaops/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    create 640 stellaops stellaops
    postrotate
        systemctl reload stellaops-gateway > /dev/null 2>&1 || true
    endscript
}
EOF

# 3. Test logrotate configuration
sudo logrotate -d /etc/logrotate.d/stellaops

Verification: stella doctor --check check.logs.rotation.configured


check.metrics.prometheus.scrape

Property Value
CheckId check.metrics.prometheus.scrape
Plugin stellaops.doctor.observability
Category Observability
Severity Warn
Tags observability, metrics
What it verifies Prometheus metrics endpoint is accessible
Evidence collected Metrics endpoint, sample metrics count
Failure modes Endpoint not exposed, auth required

Remediation:

# 1. Check metrics endpoint
curl -s http://localhost:{PORT}/metrics | head -20

# 2. Verify metrics are being scraped
curl -s http://{PROMETHEUS_HOST}:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job == "stellaops")'

# 3. Add Prometheus scrape config
# In prometheus.yml:
scrape_configs:
  - job_name: 'stellaops'
    static_configs:
      - targets: ['stellaops-gateway:8080', 'stellaops-concelier:8081']

# 4. Reload Prometheus
curl -X POST http://{PROMETHEUS_HOST}:9090/-/reload

Verification: stella doctor --check check.metrics.prometheus.scrape


9.8 Release Orchestrator Plugin (stellaops.doctor.releaseorch)

check.releaseorch.environments.configured

Property Value
CheckId check.releaseorch.environments.configured
Plugin stellaops.doctor.releaseorch
Category Integration
Severity Fail
Tags release, environments
What it verifies At least one environment is configured
Evidence collected Environment count, environment names
Failure modes No environments configured

Remediation:

# 1. List current environments
stella environments list

# 2. Create development environment
stella environments create \
  --name development \
  --type development \
  --promotion-target staging

# 3. Create staging environment
stella environments create \
  --name staging \
  --type staging \
  --promotion-target production \
  --requires-approval

# 4. Create production environment
stella environments create \
  --name production \
  --type production \
  --requires-approval

Verification: stella doctor --check check.releaseorch.environments.configured


check.releaseorch.deployments.targets

Property Value
CheckId check.releaseorch.deployments.targets
Plugin stellaops.doctor.releaseorch
Category Integration
Severity Fail
Tags release, deployments
What it verifies Deployment targets are reachable
Evidence collected Target type, connectivity status, last heartbeat
Failure modes Agent offline, target unreachable

Remediation:

# 1. List deployment targets
stella deployments targets list

# 2. Check agent status
stella deployments targets health --target {TARGET_ID}

# 3. Restart agent if needed
# On target host:
sudo systemctl restart stellaops-agent

# 4. Re-register target if agent was reinstalled
stella deployments targets register \
  --name {TARGET_NAME} \
  --type docker-compose \
  --endpoint ssh://user@host

Verification: stella doctor --check check.releaseorch.deployments.targets


10. Plugin Implementation Details

10.1 Core Platform Plugin

Location: src/__Libraries/StellaOps.Doctor/Plugins/Core/

Provides foundational checks for configuration, runtime, and platform health.

Checks Provided:

  • check.config.required
  • check.config.syntax
  • check.config.deprecated
  • check.runtime.dotnet
  • check.runtime.memory
  • check.runtime.disk.space
  • check.runtime.disk.permissions
  • check.time.sync
  • check.crypto.profiles

Dependencies: None (core plugin)


10.2 Database & Migrations Plugin

Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Database/

Provides database connectivity and migration state checks.

References:

  • src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs
  • src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationStatusService.cs

Checks Provided:

  • check.database.connectivity
  • check.database.version
  • check.database.migrations.pending
  • check.database.migrations.checksum
  • check.database.migrations.lock
  • check.database.schema.{schema} (dynamic per schema)
  • check.database.connections.pool

Configuration:

Doctor:
  Plugins:
    Database:
      Enabled: true
      ConnectionTimeout: 10s
      Schemas:
        - auth
        - vuln
        - scanner
        - orchestrator

10.3 Service Graph Plugin

Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ServiceGraph/

Validates inter-service connectivity via Gateway and Router.

References:

  • src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs
  • src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs

Checks Provided:

  • check.services.gateway.running
  • check.services.gateway.routing
  • check.services.{service}.health (dynamic per service)
  • check.services.{service}.connectivity (dynamic per service)
  • check.services.authority.connectivity

Configuration:

Doctor:
  Plugins:
    ServiceGraph:
      Enabled: true
      HealthEndpointTimeout: 5s
      Services:
        - name: concelier
          port: 8081
        - name: scanner
          port: 8082
        - name: attestor
          port: 8083

10.4 Security Plugin

Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Security/

Validates authentication, authorization, TLS, and secrets management.

References:

  • src/Authority/StellaOps.Authority/StellaOps.Authority.Plugin.Ldap/
  • src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/HashiCorpVaultConnector.cs

Checks Provided:

  • check.auth.oidc.discovery
  • check.auth.oidc.jwks
  • check.auth.ldap.bind
  • check.auth.ldap.search
  • check.auth.ldap.groups
  • check.tls.certificates.expiry
  • check.tls.certificates.chain
  • check.secrets.vault.connectivity
  • check.secrets.vault.auth
  • check.secrets.vault.paths

10.5 SCM Integration Plugins

GitHub Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitHub/ GitLab Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitLab/

References:

  • src/Integrations/__Plugins/StellaOps.Integrations.Plugin.GitHubApp/
  • etc/scm-connectors/github.yaml

GitHub Checks:

  • check.integration.scm.github.connectivity
  • check.integration.scm.github.auth
  • check.integration.scm.github.permissions
  • check.integration.scm.github.ratelimit

GitLab Checks:

  • check.integration.scm.gitlab.connectivity
  • check.integration.scm.gitlab.auth
  • check.integration.scm.gitlab.permissions

10.6 Registry Integration Plugins

Harbor Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.Harbor/ ECR Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.ECR/

References:

  • src/Integrations/__Plugins/StellaOps.Integrations.Plugin.Harbor/

Harbor Checks:

  • check.integration.registry.harbor.connectivity
  • check.integration.registry.harbor.auth
  • check.integration.registry.harbor.pull

ECR Checks:

  • check.integration.registry.ecr.connectivity
  • check.integration.registry.ecr.pull

10.7 Observability Plugin

Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Observability/

References:

  • devops/telemetry/otel-collector.yaml

Checks Provided:

  • check.telemetry.otlp.endpoint
  • check.logs.directory.writable
  • check.logs.rotation.configured
  • check.metrics.prometheus.scrape

10.8 Release Orchestrator Plugin

Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ReleaseOrch/

References:

  • src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/

Checks Provided:

  • check.releaseorch.environments.configured
  • check.releaseorch.deployments.targets

Appendix A: Complete Check ID Reference

CheckId Plugin Category Default Severity
check.config.required core Core Fail
check.config.syntax core Core Fail
check.config.deprecated core Core Warn
check.runtime.dotnet core Core Fail
check.runtime.memory core Core Warn
check.runtime.disk.space core Core Warn
check.runtime.disk.permissions core Core Fail
check.time.sync core Core Warn
check.crypto.profiles core Core Fail
check.database.connectivity database Database Fail
check.database.version database Database Warn
check.database.migrations.pending database Database Fail
check.database.migrations.checksum database Database Fail
check.database.migrations.lock database Database Warn
check.database.schema.{schema} database Database Fail
check.database.connections.pool database Database Warn
check.services.gateway.running servicegraph ServiceGraph Fail
check.services.gateway.routing servicegraph ServiceGraph Fail
check.services.{service}.health servicegraph ServiceGraph Fail
check.services.{service}.connectivity servicegraph ServiceGraph Fail
check.services.authority.connectivity servicegraph ServiceGraph Fail
check.auth.oidc.discovery security Security Fail
check.auth.oidc.jwks security Security Fail
check.auth.ldap.bind security Security Fail
check.auth.ldap.search security Security Fail
check.auth.ldap.groups security Security Warn
check.tls.certificates.expiry security Security Warn/Fail
check.tls.certificates.chain security Security Fail
check.secrets.vault.connectivity security Security Fail
check.secrets.vault.auth security Security Fail
check.secrets.vault.paths security Security Fail
check.integration.scm.github.connectivity scm.github Integration Fail
check.integration.scm.github.auth scm.github Integration Fail
check.integration.scm.github.permissions scm.github Integration Fail
check.integration.scm.github.ratelimit scm.github Integration Warn
check.integration.scm.gitlab.connectivity scm.gitlab Integration Fail
check.integration.scm.gitlab.auth scm.gitlab Integration Fail
check.integration.registry.harbor.connectivity registry.harbor Integration Fail
check.integration.registry.harbor.auth registry.harbor Integration Fail
check.integration.registry.harbor.pull registry.harbor Integration Fail
check.integration.registry.ecr.connectivity registry.ecr Integration Fail
check.integration.registry.ecr.pull registry.ecr Integration Fail
check.telemetry.otlp.endpoint observability Observability Warn
check.logs.directory.writable observability Observability Fail
check.logs.rotation.configured observability Observability Warn
check.metrics.prometheus.scrape observability Observability Warn
check.releaseorch.environments.configured releaseorch Integration Fail
check.releaseorch.deployments.targets releaseorch Integration Fail

Appendix B: Quick Reference - Common Issues

Database Issues

# Connection refused
sudo systemctl start postgresql
stella doctor --check check.database.connectivity

# Pending migrations
stella system migrations-run --category release
stella doctor --check check.database.migrations.pending

# Migration lock stuck
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"

Authentication Issues

# OIDC discovery fails
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration
sudo systemctl restart stellaops-authority

# LDAP bind fails
ldapsearch -x -H ldaps://{HOST}:636 -D "{BIND_DN}" -w "{PASSWORD}" -b "" -s base

Integration Issues

# GitHub rate limit
curl -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit

# Harbor connectivity
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq

Document generated: 2026-01-12 Stella Ops Doctor Capability Specification v1.0.0-draft