Files
git.stella-ops.org/docs/doctor/doctor-capabilities.md
2026-01-14 18:39:19 +02:00

3471 lines
106 KiB
Markdown

# Stella Ops Doctor Capability Specification
> **Status:** Planning / Capability Design
> **Version:** 1.0.0-draft
> **Last Updated:** 2026-01-12
---
## Table of Contents
1. [Executive Summary](#1-executive-summary)
2. [Current State Analysis](#2-current-state-analysis)
3. [Doctor Architecture](#3-doctor-architecture)
4. [Plugin System Specification](#4-plugin-system-specification)
5. [CLI Surface](#5-cli-surface)
6. [UI Surface](#6-ui-surface)
7. [API Surface](#7-api-surface)
8. [Remediation Command Patterns](#8-remediation-command-patterns)
9. [Doctor Check Catalog](#9-doctor-check-catalog)
10. [Plugin Implementation Details](#10-plugin-implementation-details)
---
## 1. Executive Summary
### 1.1 Purpose
The Doctor capability provides comprehensive self-service diagnostics for Stella Ops deployments. It enables operators, DevOps engineers, and developers to:
- **Diagnose** what is working and what is not
- **Understand** why failures occur with collected evidence
- **Remediate** issues with copy/paste commands
- **Verify** fixes with re-runnable checks
### 1.2 Target Users
| User Type | Primary Use Case |
|-----------|------------------|
| **Operators** | Pre-deployment validation, incident triage, routine health checks |
| **DevOps Engineers** | Integration setup, migration management, environment troubleshooting |
| **Developers** | Local development environment validation, API connectivity testing |
| **Support Engineers** | Remote diagnostics, evidence collection for escalation |
### 1.3 Key Principles
1. **Plugin-First Architecture** - All checks implemented via extensible plugins
2. **Actionable Remediation** - Every failure includes copy/paste fix commands
3. **Zero Docs Familiarity** - Users can diagnose and fix without reading documentation
4. **Evidence-Based Diagnostics** - All checks collect and report evidence
5. **Multi-Surface Consistency** - Same check engine powers CLI, UI, and API
6. **Non-Destructive Fixes** - Doctor never executes destructive actions; fix commands must be safe and idempotent
### 1.4 Surfaces
| Surface | Entry Point | Primary Use |
|---------|-------------|-------------|
| **CLI** | `stella doctor` | Automation, CI/CD gates, SSH troubleshooting |
| **UI** | `/ops/doctor` | Interactive diagnosis, team collaboration |
| **API** | `POST /api/v1/doctor/run` | Programmatic integration, monitoring systems |
---
## 2. Current State Analysis
### 2.1 CLI - Current State
**Location:** `src/Cli/StellaOps.Cli/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Entry Point | `src/Cli/StellaOps.Cli/Program.cs` | Main CLI bootstrap using System.CommandLine |
| Command Factory | `src/Cli/StellaOps.Cli/Commands/CommandFactory.cs` | Registers 88+ command groups |
| Config Bootstrap | `src/Cli/StellaOps.Cli/Configuration/CliBootstrapper.cs` | Environment + YAML/JSON config loading |
| Exit Codes | `src/Cli/StellaOps.Cli/CliExitCodes.cs` | Standardized exit codes (0-99) |
| Crypto Validator | `src/Cli/StellaOps.Cli/Services/CryptoProfileValidator.cs` | Startup validation for crypto profiles |
| Migration Commands | `src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs` | `migrations-run`, `migrations-status`, `migrations-verify` |
#### Existing Validation Patterns
```csharp
// CryptoProfileValidator.cs - Startup validation pattern
public sealed record ValidationResult
{
public bool IsValid { get; init; }
public bool HasWarnings { get; init; }
public bool HasErrors { get; init; }
public List<string> Errors { get; init; }
public List<string> Warnings { get; init; }
public string ActiveProfile { get; init; }
public List<string> AvailableProviders { get; init; }
}
```
#### Gaps
- No unified `stella doctor` command
- Output formatting is ad-hoc per command (no centralized formatter)
- No remediation command generation
- Validation only for crypto profiles, not comprehensive system state
#### Proposed Capability
```bash
# Quick system health check
stella doctor
# Full diagnostic with all checks
stella doctor --full
# Check specific category
stella doctor --category database
stella doctor --category integrations
# Check specific plugin
stella doctor --plugin scm.github
# Run single check
stella doctor --check check.database.migrations.pending
# Output formats
stella doctor --format json
stella doctor --format markdown
stella doctor --format text
# Export report
stella doctor --export report.json
stella doctor --export report.md
# Filter by severity
stella doctor --severity fail,warn
```
---
### 2.2 Health Infrastructure - Current State
**Pattern:** Extensive health endpoints across 20+ services
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Health Status Enum | `src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthStatus.cs` | Unknown, Healthy, Degraded, Unhealthy |
| Health Check Result | `src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs` | Rich result with factory methods |
| Gateway Health | `src/Gateway/StellaOps.Gateway.WebService/Middleware/HealthCheckMiddleware.cs` | `/health/live`, `/health/ready`, `/health/startup` |
| Scanner Health | `src/Scanner/StellaOps.Scanner.WebService/Endpoints/HealthEndpoints.cs` | `/healthz`, `/readyz` |
| Orchestrator Health | `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/HealthEndpoints.cs` | `/health/details` |
| Platform Health | `src/Platform/__Libraries/StellaOps.Platform.Health/PlatformHealthService.cs` | Cross-service aggregation |
| Health Contract | `devops/docker/health-endpoints.md` | Formal endpoint specification |
#### Health Check Result Model
```csharp
// From src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs
public sealed record HealthCheckResult(
HealthStatus Status,
string? Message,
IReadOnlyDictionary<string, string>? Details,
DateTimeOffset CheckedAt,
TimeSpan Duration)
{
public static HealthCheckResult Healthy(string? message = null) => ...
public static HealthCheckResult Degraded(string message) => ...
public static HealthCheckResult Unhealthy(string message, Exception? ex = null) => ...
}
```
#### Gaps
- Health endpoints check liveness/readiness, not comprehensive diagnostics
- No remediation guidance in health responses
- No aggregated cross-service diagnostic view
- Health checks don't verify configuration validity
---
### 2.3 Doctor Service - Current State (ReleaseOrchestrator)
**Location:** `src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Doctor Service | `Doctor/DoctorService.cs` | Runs `IDoctorCheck` implementations |
| Doctor Report | `Doctor/DoctorReport.cs` | Aggregated results with counts |
| Check Result | `Doctor/CheckResult.cs` | Individual check outcome |
| IDoctorCheck | `Doctor/IDoctorCheck.cs` | Plugin interface for checks |
#### IDoctorCheck Interface
```csharp
// Existing interface (simplified)
public interface IDoctorCheck
{
string Name { get; }
string Category { get; }
Task<CheckResult> RunAsync(CancellationToken ct);
}
public sealed record CheckResult(
string Name,
HealthStatus Status,
string? Message,
TimeSpan Duration);
public sealed record DoctorReport(
int PassCount,
int WarningCount,
int FailCount,
int SkippedCount,
HealthStatus OverallStatus,
TimeSpan TotalDuration,
IReadOnlyList<CheckResult> Results);
```
#### Gaps
- Only available in ReleaseOrchestrator, not CLI or other modules
- No remediation commands in output
- No evidence collection
- Limited to integration checks only
- No plugin discovery mechanism
---
### 2.4 Integration Plugins - Current State
**Location:** `src/Integrations/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Plugin Contract | `__Libraries/StellaOps.Integrations.Contracts/IIntegrationConnectorPlugin.cs` | Core plugin interface |
| Integration Types | `__Libraries/StellaOps.Integrations.Contracts/IntegrationType.cs` | Registry, SCM, CI/CD, etc. |
| GitHub Plugin | `__Plugins/StellaOps.Integrations.Plugin.GitHubApp/GitHubAppConnectorPlugin.cs` | GitHub App integration |
| Harbor Plugin | `__Plugins/StellaOps.Integrations.Plugin.Harbor/HarborConnectorPlugin.cs` | Harbor registry |
| Plugin Loader | `StellaOps.Integrations.WebService/IntegrationPluginLoader.cs` | Assembly-based discovery |
| Vault Connectors | `src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/` | HashiCorp Vault, Azure Key Vault |
#### IIntegrationConnectorPlugin Interface
```csharp
public interface IIntegrationConnectorPlugin : IAvailabilityPlugin
{
IntegrationType Type { get; }
IntegrationProvider Provider { get; }
string Name { get; }
Task<TestConnectionResult> TestConnectionAsync(
IntegrationConfig config,
CancellationToken ct);
Task<HealthCheckResult> CheckHealthAsync(
IntegrationConfig config,
CancellationToken ct);
}
```
#### Supported Integration Types
```csharp
public enum IntegrationType
{
Registry = 1, // Harbor, ECR, GCR, ACR, Docker Hub, Quay, Artifactory
Scm = 2, // GitHub, GitLab, Bitbucket, Gitea, Azure DevOps
CiCd = 3, // GitHub Actions, GitLab CI, Jenkins, CircleCI
RepoSource = 4, // npm, PyPI, Maven, NuGet, Crates.io
RuntimeHost = 5, // eBPF, ETW, dyld agents
FeedMirror = 6 // NVD, OSV, StellaOps mirrors
}
```
#### Gaps
- `TestConnectionAsync` exists but not surfaced via CLI doctor
- No standardized remediation output
- Health checks don't report required permissions/scopes
- No validation of webhook/event delivery configuration
---
### 2.5 Authority Plugins - Current State
**Location:** `src/Authority/StellaOps.Authority/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Plugin Abstractions | `StellaOps.Authority.Plugins.Abstractions/` | Plugin registration interface |
| LDAP Plugin | `StellaOps.Authority.Plugin.Ldap/` | LDAP/AD integration |
| OIDC Plugin | `StellaOps.Authority.Plugin.Oidc/` | OpenID Connect |
| SAML Plugin | `StellaOps.Authority.Plugin.Saml/` | SAML 2.0 |
| Plugin Registry | `StellaOps.Authority/AuthorityPluginRegistry.cs` | Manages named plugins |
| LDAP Config | `etc/authority.plugins/ldap.yaml` | Sample configuration |
#### LDAP Plugin Capabilities
```yaml
# From etc/authority.plugins/ldap.yaml
connection:
host: "ldaps://ldap.example.internal"
port: 636
searchBase: "ou=people,dc=example,dc=internal"
bindDn: "cn=bind-user,ou=service,dc=example,dc=internal"
bindPasswordSecret: "file:/etc/secrets/ldap-bind.txt"
security:
requireTls: true
claims:
groupAttribute: "memberOf"
cache:
enabled: true
ttlSeconds: 600
```
#### Gaps
- No CLI command to validate LDAP configuration
- Health checks exist but don't provide remediation
- No validation of group mapping correctness
- TLS certificate validation not exposed as diagnostic
---
### 2.6 Database & Migrations - Current State
**Location:** `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Migration Runner | `Migrations/MigrationRunner.cs` | Executes SQL migrations with advisory locks |
| Migration Category | `Migrations/MigrationCategory.cs` | Startup, Release, Seed, Data |
| Status Service | `Migrations/MigrationStatusService.cs` | Query migration state |
| CLI Commands | `src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs` | `migrations-run/status/verify` |
| Strategy Docs | `docs/db/MIGRATION_STRATEGY.md` | Migration process documentation |
#### Migration Categories
| Prefix | Category | Automatic | Breaking |
|--------|----------|-----------|----------|
| `001-099` | Startup | Yes | No |
| `100-199` | Release | No (CLI) | Yes |
| `S001-S999` | Seed | Yes | No |
| `DM001-DM999` | Data | Background | Varies |
#### Schema Tracking
```sql
CREATE TABLE {schema}.schema_migrations (
migration_name TEXT PRIMARY KEY,
category TEXT NOT NULL DEFAULT 'startup',
checksum TEXT NOT NULL,
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
applied_by TEXT,
duration_ms INT
);
```
#### Gaps
- Migration status not integrated with doctor
- No checksum mismatch diagnostics with remediation
- Lock contention not diagnosed
- No cross-schema migration state view
---
### 2.7 UI - Current State
**Location:** `src/Web/StellaOps.Web/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Routes | `src/app/app.routes.ts` | Angular Router configuration |
| Platform Health | `src/app/features/platform-health/` | Health dashboard at `/ops/health` |
| Health Client | `src/app/core/api/platform-health.client.ts` | API client for health endpoints |
| Console Status | `src/app/features/console/console-status.component.ts` | Queue/run status |
#### Platform Health Dashboard Features
- Real-time KPI strip (services, latency, error rate, incidents)
- Service health grid with grouping (healthy/degraded/unhealthy)
- Dependency graph visualization
- Incident timeline (last 24h)
- Auto-refresh every 10 seconds
#### Gaps
- No diagnostic check execution from UI
- No remediation command display
- No evidence collection/export
- Health dashboard shows status, not actionable diagnostics
---
### 2.8 Service Connectivity - Current State
**Location:** `src/Gateway/`, `src/Router/`
#### What Exists Today
| Component | File Path | Description |
|-----------|-----------|-------------|
| Gateway Routing | `src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs` | HTTP to microservice routing |
| Connection Manager | `src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs` | HELLO handshake, heartbeats |
| Routing State | `src/Router/__Libraries/StellaOps.Router.Common/Abstractions/IGlobalRoutingState.cs` | Live service connections |
| Claims Propagation | `src/Gateway/StellaOps.Gateway.WebService/Middleware/ClaimsPropagationMiddleware.cs` | OAuth claims forwarding |
#### Service Registration Flow
1. Service connects to Gateway via Router transport (TCP/TLS/Valkey)
2. HELLO handshake with endpoint/schema declarations
3. Periodic heartbeats with health/latency metrics
4. Gateway maintains `ConnectionState` for routing decisions
#### Gaps
- No CLI command to verify service graph health
- Routing failures not diagnosed with remediation
- No validation of claims propagation configuration
- Transport connectivity not exposed as diagnostic
---
## 3. Doctor Architecture
### 3.1 High-Level Architecture
```
+------------------+ +------------------+ +------------------+
| CLI | | UI | | External |
| stella doctor | | /ops/doctor | | Monitoring |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
v v v
+------------------------------------------------------------------------+
| Doctor API Layer |
| POST /api/v1/doctor/run GET /api/v1/doctor/checks |
| GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Doctor Engine (Core) |
| +------------------+ +------------------+ +------------------+ |
| | Check Registry | | Check Executor | | Report Generator | |
| | - Discovery | | - Parallel exec | | - JSON/MD/Text | |
| | - Filtering | | - Timeout mgmt | | - Remediation | |
| +------------------+ +------------------+ +------------------+ |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Plugin System |
+--------+---------+---------+---------+---------+---------+-------------+
| | | | | |
v v v v v v
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
| Core | | DB & | |Service| | SCM | |Regis-| | Vault| | Authority|
| Plugin | |Migra-| | Graph | |Plugin| | try | |Plugin| | Plugin |
| | | tions| |Plugin | | | |Plugin| | | | |
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
```
### 3.2 Core Components
#### Doctor Engine
**Proposed Location:** `src/__Libraries/StellaOps.Doctor/`
```
StellaOps.Doctor/
├── Engine/
│ ├── DoctorEngine.cs # Main orchestrator
│ ├── CheckExecutor.cs # Parallel check execution
│ └── CheckRegistry.cs # Plugin discovery & filtering
├── Models/
│ ├── DoctorCheckResult.cs # Extended check result with evidence
│ ├── DoctorReport.cs # Full report model
│ ├── Remediation.cs # Fix command model
│ └── Evidence.cs # Collected evidence model
├── Plugins/
│ ├── IDoctorPlugin.cs # Plugin interface
│ ├── IDoctorCheck.cs # Check interface (extended)
│ └── DoctorPluginContext.cs # Plugin execution context
├── Output/
│ ├── JsonReportFormatter.cs # JSON output
│ ├── MarkdownReportFormatter.cs # Markdown output
│ └── TextReportFormatter.cs # Console text output
└── DoctorServiceExtensions.cs # DI registration
```
#### Check Execution Model
```csharp
public sealed class CheckExecutor
{
private readonly IEnumerable<IDoctorPlugin> _plugins;
private readonly TimeProvider _timeProvider;
private readonly ILogger<CheckExecutor> _logger;
public async Task<DoctorReport> RunAsync(
DoctorRunOptions options,
CancellationToken ct)
{
var checks = GetFilteredChecks(options);
var results = new ConcurrentBag<DoctorCheckResult>();
// Parallel execution with configurable concurrency
await Parallel.ForEachAsync(
checks,
new ParallelOptions
{
MaxDegreeOfParallelism = options.Parallelism,
CancellationToken = ct
},
async (check, token) =>
{
var result = await ExecuteCheckAsync(check, options, token);
results.Add(result);
});
return GenerateReport(results, options);
}
}
```
### 3.3 Result Model
```csharp
public sealed record DoctorCheckResult
{
// Identity
public required string CheckId { get; init; }
public required string PluginId { get; init; }
public required string Category { get; init; }
// Outcome
public required DoctorSeverity Severity { get; init; } // Pass, Warn, Fail, Skip
public required string Diagnosis { get; init; }
// Evidence
public required Evidence Evidence { get; init; }
// Remediation
public IReadOnlyList<string>? LikelyCauses { get; init; }
public Remediation? Remediation { get; init; }
public string? VerificationCommand { get; init; }
// Metadata
public required TimeSpan Duration { get; init; }
public required DateTimeOffset ExecutedAt { get; init; }
}
public enum DoctorSeverity
{
Pass = 0,
Info = 1,
Warn = 2,
Fail = 3,
Skip = 4
}
public sealed record Evidence
{
public required string Description { get; init; }
public required IReadOnlyDictionary<string, string> Data { get; init; }
public IReadOnlyList<string>? SensitiveKeys { get; init; } // Keys to redact in output
}
public sealed record Remediation
{
public required IReadOnlyList<RemediationStep> Steps { get; init; }
public string? SafetyNote { get; init; }
public bool RequiresBackup { get; init; }
}
public sealed record RemediationStep
{
public required int Order { get; init; }
public required string Description { get; init; }
public required string Command { get; init; }
public CommandType CommandType { get; init; } // Shell, SQL, API, FileEdit
public IReadOnlyDictionary<string, string>? Placeholders { get; init; }
}
public enum CommandType
{
Shell, // Bash/PowerShell command
SQL, // SQL statement
API, // API call (curl/stella CLI)
FileEdit, // File modification
Manual // Manual step (no command)
}
```
---
## 4. Plugin System Specification
### 4.1 Plugin Interface
```csharp
/// <summary>
/// Base interface for Doctor plugins.
/// Plugins group related checks and share configuration context.
/// </summary>
public interface IDoctorPlugin
{
/// <summary>Unique plugin identifier (e.g., "stellaops.doctor.database")</summary>
string PluginId { get; }
/// <summary>Human-readable name</summary>
string DisplayName { get; }
/// <summary>Plugin category for filtering</summary>
DoctorCategory Category { get; }
/// <summary>Plugin version for compatibility</summary>
Version Version { get; }
/// <summary>Minimum Doctor engine version required</summary>
Version MinEngineVersion { get; }
/// <summary>Check if plugin is available in current environment</summary>
bool IsAvailable(IServiceProvider services);
/// <summary>Get all checks provided by this plugin</summary>
IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context);
/// <summary>Initialize plugin with configuration</summary>
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
}
public enum DoctorCategory
{
Core, // Platform, config, runtime
Database, // Schema, migrations, connectivity
ServiceGraph, // Inter-service communication
Integration, // External system integrations
Security, // Auth, TLS, secrets
Observability // Logs, metrics, traces
}
```
### 4.2 Check Interface
```csharp
/// <summary>
/// Individual diagnostic check.
/// </summary>
public interface IDoctorCheck
{
/// <summary>Unique check identifier (e.g., "check.database.migrations.pending")</summary>
string CheckId { get; }
/// <summary>Human-readable name</summary>
string Name { get; }
/// <summary>What this check verifies</summary>
string Description { get; }
/// <summary>Default severity if check fails</summary>
DoctorSeverity DefaultSeverity { get; }
/// <summary>Tags for filtering (e.g., ["quick", "security", "migration"])</summary>
IReadOnlyList<string> Tags { get; }
/// <summary>Estimated execution time</summary>
TimeSpan EstimatedDuration { get; }
/// <summary>Check if this check can run in current context</summary>
bool CanRun(DoctorPluginContext context);
/// <summary>Execute the check</summary>
Task<DoctorCheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
}
```
### 4.3 Plugin Context
```csharp
public sealed class DoctorPluginContext
{
public required IServiceProvider Services { get; init; }
public required IConfiguration Configuration { get; init; }
public required TimeProvider TimeProvider { get; init; }
public required ILogger Logger { get; init; }
// Runtime info
public required string EnvironmentName { get; init; } // Development, Staging, Production
public required string? TenantId { get; init; }
// Plugin configuration
public required JsonElement PluginConfig { get; init; }
// Evidence helpers
public EvidenceBuilder CreateEvidence() => new();
public RemediationBuilder CreateRemediation() => new();
// Secret redaction
public string Redact(string value) => "***REDACTED***";
public string RedactConnectionString(string cs) => /* redact password */;
}
```
### 4.4 Plugin Discovery
#### Static Discovery (Build-time)
Plugins register via DI at startup:
```csharp
// In Program.cs or startup
services.AddDoctorPlugin<CoreDoctorPlugin>();
services.AddDoctorPlugin<DatabaseDoctorPlugin>();
services.AddDoctorPlugin<ServiceGraphDoctorPlugin>();
services.AddDoctorPlugin<ScmGitHubDoctorPlugin>();
// ...
```
#### Dynamic Discovery (Runtime)
Plugins can be loaded from assemblies:
```csharp
// In DoctorPluginLoader.cs
public class DoctorPluginLoader
{
public IEnumerable<IDoctorPlugin> LoadFromDirectory(string path)
{
foreach (var dll in Directory.GetFiles(path, "StellaOps.Doctor.Plugin.*.dll"))
{
var assembly = Assembly.LoadFrom(dll);
foreach (var type in assembly.GetTypes()
.Where(t => typeof(IDoctorPlugin).IsAssignableFrom(t) && !t.IsAbstract))
{
yield return (IDoctorPlugin)Activator.CreateInstance(type)!;
}
}
}
}
```
### 4.5 Declarative Doctor Packs (YAML)
Doctor packs provide declarative checks that wrap CLI commands and parsing rules.
They complement compiled plugins and are loaded from `plugins/doctor/*.yaml` (plus optional override directories).
Short example:
```yaml
apiVersion: stella.ops/doctor.v1
kind: DoctorPlugin
metadata:
name: doctor-release-orchestrator-gitlab
spec:
discovery:
when:
- env: GITLAB_URL
```
Full sample: `docs/benchmarks/doctor/doctor-plugin-release-orchestrator-gitlab.yaml`
Key fields:
- `spec.discovery.when`: env/file existence gates.
- `checks[].run.exec`: command to execute (must be deterministic).
- `checks[].parse.expect` or `checks[].parse.expectJson`: pass/fail rules.
- `checks[].how_to_fix.commands[]`: exact fix commands printed verbatim.
### 4.6 Plugin Directory Structure
```
src/
├── __Libraries/
│ └── StellaOps.Doctor/ # Core doctor engine
│ └── Plugins/
│ └── Core/ # Built-in core plugin
├── Doctor/
│ └── __Plugins/
│ ├── StellaOps.Doctor.Plugin.Database/
│ ├── StellaOps.Doctor.Plugin.ServiceGraph/
│ ├── StellaOps.Doctor.Plugin.Scm.GitHub/
│ ├── StellaOps.Doctor.Plugin.Scm.GitLab/
│ ├── StellaOps.Doctor.Plugin.Registry.Harbor/
│ ├── StellaOps.Doctor.Plugin.Registry.ECR/
│ ├── StellaOps.Doctor.Plugin.Vault/
│ ├── StellaOps.Doctor.Plugin.Authority/
│ └── StellaOps.Doctor.Plugin.Observability/
```
### 4.7 Plugin Configuration
Plugins read configuration from the standard config hierarchy:
```yaml
# In stellaops.yaml or environment-specific config
Doctor:
Enabled: true
DefaultTimeout: 30s
Parallelism: 4
Plugins:
Database:
Enabled: true
ConnectionTimeout: 10s
ServiceGraph:
Enabled: true
HealthEndpointTimeout: 5s
Scm:
GitHub:
Enabled: true
RateLimitThreshold: 100
Registry:
Harbor:
Enabled: true
SkipTlsVerify: false
Vault:
Enabled: true
SecretsToValidate:
- "secret/data/stellaops/api-keys"
- "secret/data/stellaops/certificates"
```
### 4.8 Security Model
#### Secret Redaction
All evidence output is sanitized:
```csharp
public sealed class EvidenceBuilder
{
private readonly Dictionary<string, string> _data = new();
private readonly List<string> _sensitiveKeys = new();
public EvidenceBuilder Add(string key, string value)
{
_data[key] = value;
return this;
}
public EvidenceBuilder AddSensitive(string key, string value)
{
_data[key] = value;
_sensitiveKeys.Add(key);
return this;
}
public EvidenceBuilder AddConnectionString(string key, string connectionString)
{
// Parse and redact password
var redacted = RedactConnectionStringPassword(connectionString);
_data[key] = redacted;
return this;
}
}
```
#### RBAC Permissions
Doctor checks require specific scopes:
| Scope | Description |
|-------|-------------|
| `doctor:run` | Execute doctor checks |
| `doctor:run:full` | Execute all checks including sensitive |
| `doctor:export` | Export diagnostic reports |
| `admin:system` | Access system-level checks |
### 4.9 Versioning Strategy
- **Engine version:** Semantic versioning (e.g., `1.0.0`)
- **Plugin version:** Independent semantic versioning
- **Compatibility:** Plugins declare `MinEngineVersion`
- **Check IDs:** Stable across versions (never renamed)
```csharp
// Version compatibility check
if (plugin.MinEngineVersion > DoctorEngine.Version)
{
_logger.LogWarning(
"Plugin {PluginId} requires engine {Required}, current is {Current}. Skipping.",
plugin.PluginId, plugin.MinEngineVersion, DoctorEngine.Version);
continue;
}
```
---
## 5. CLI Surface
### 5.1 Command Structure
**Proposed Location:** `src/Cli/StellaOps.Cli/Commands/DoctorCommandGroup.cs`
```bash
stella doctor run [options]
stella doctor list [options]
stella doctor fix --from report.json [--apply]
```
Note: `stella doctor` remains shorthand for `stella doctor run` for compatibility.
`stella doctor fix` executes only non-destructive commands. Any destructive step
must be presented as manual guidance and is not eligible for `--apply`.
### 5.2 Options and Flags
| Option | Short | Type | Default | Description |
|--------|-------|------|---------|-------------|
| `--format` | `-f` | enum | `text` | Output format: `text`, `table`, `json`, `markdown` |
| `--quick` | `-q` | flag | false | Run only quick checks (tagged `quick`) |
| `--full` | | flag | false | Run all checks including slow/intensive |
| `--pack` | | string[] | all | Filter by pack name (manifest grouping) |
| `--category` | `-c` | string[] | all | Filter by category: `core`, `database`, `service-graph`, `integration`, `security`, `observability` |
| `--plugin` | `-p` | string[] | all | Filter by plugin ID (e.g., `scm.github`) |
| `--check` | | string | | Run single check by ID |
| `--severity` | `-s` | enum[] | all | Filter output by severity: `pass`, `info`, `warn`, `fail` |
| `--export` | `-e` | path | | Export report to file |
| `--timeout` | `-t` | duration | 30s | Per-check timeout |
| `--parallel` | | int | 4 | Max parallel check execution |
| `--no-remediation` | | flag | false | Skip remediation command generation |
| `--verbose` | `-v` | flag | false | Include detailed evidence in output |
| `--tenant` | | string | | Tenant context for multi-tenant checks |
#### Fix Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--from` | path | required | Path to JSON report with `how_to_fix` commands |
| `--apply` | flag | false | Execute fixes (default is dry-run preview) |
Only commands marked safe and non-destructive are eligible for `--apply`.
Destructive changes must be printed as manual steps and executed by the operator outside Doctor.
### 5.3 Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All checks passed |
| 1 | One or more warnings |
| 2 | One or more failures |
| 3 | Doctor engine error |
| 4 | Invalid arguments |
| 5 | Timeout exceeded |
### 5.4 Usage Examples
```bash
# Quick health check (alias)
stella doctor
# Run all checks explicitly
stella doctor run
# Full diagnostic
stella doctor --full
# Check only database category
stella doctor --category database
# Check specific integration
stella doctor --plugin scm.github
# Run single check
stella doctor --check check.database.migrations.pending
# JSON output for CI/CD
stella doctor --format json --severity fail,warn
# Run orchestrator pack with table output
stella doctor run --pack orchestrator --format table
# Apply fixes from a JSON report (dry-run unless --apply)
stella doctor fix --from out.json --apply
# Export markdown report
stella doctor --full --format markdown --export doctor-report.md
# Verbose with all evidence
stella doctor --verbose --full
# Quick check with 60s timeout
stella doctor --quick --timeout 60s
```
### 5.5 Text Output Format
```
Stella Ops Doctor
=================
Running 47 checks across 8 plugins...
[PASS] check.config.required
All required configuration values are present
[PASS] check.database.connectivity
PostgreSQL connection successful (latency: 12ms)
[WARN] check.tls.certificates.expiry
Diagnosis: TLS certificate expires in 14 days
Evidence:
Certificate: /etc/ssl/certs/stellaops.crt
Subject: CN=stellaops.example.com
Expires: 2026-01-26T00:00:00Z
Days remaining: 14
Likely Causes:
1. Certificate renewal not scheduled
2. ACME/Let's Encrypt automation not configured
Fix Steps:
# 1. Check current certificate
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates
# 2. Renew certificate (if using certbot)
sudo certbot renew --cert-name stellaops.example.com
# 3. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway
Verification:
stella doctor --check check.tls.certificates.expiry
[FAIL] check.database.migrations.pending
Diagnosis: 3 pending release migrations detected in schema 'auth'
Evidence:
Schema: auth
Current version: 099_add_dpop_thumbprints
Pending migrations:
- 100_add_tenant_quotas
- 101_add_audit_retention
- 102_add_session_revocation
Connection: postgres://localhost:5432/stellaops (user: stella_app)
Likely Causes:
1. Release migrations not applied before deployment
2. Migration files added after last deployment
Fix Steps:
# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Apply pending release migrations
stella system migrations-run --module Authority --category release
# 3. Verify migrations applied
stella system migrations-status --module Authority
Verification:
stella doctor --check check.database.migrations.pending
────────────────────────────────────────────────────────────────
Summary: 44 passed, 2 warnings, 1 failed (47 total)
Duration: 8.3s
────────────────────────────────────────────────────────────────
```
---
## 6. UI Surface
### 6.1 Route and Location
**Route:** `/ops/doctor`
**Location:** `src/Web/StellaOps.Web/src/app/features/doctor/`
### 6.2 Component Structure
```
src/app/features/doctor/
├── doctor.routes.ts
├── doctor-dashboard.component.ts # Main page
├── doctor-dashboard.component.html
├── doctor-dashboard.component.scss
├── components/
│ ├── check-list/
│ │ ├── check-list.component.ts # Filterable check list
│ │ └── check-list.component.html
│ ├── check-result/
│ │ ├── check-result.component.ts # Single check display
│ │ └── check-result.component.html
│ ├── remediation-panel/
│ │ ├── remediation-panel.component.ts # Fix commands display
│ │ └── remediation-panel.component.html
│ ├── evidence-viewer/
│ │ ├── evidence-viewer.component.ts # Collected evidence
│ │ └── evidence-viewer.component.html
│ └── export-dialog/
│ ├── export-dialog.component.ts # Export options
│ └── export-dialog.component.html
└── services/
├── doctor.client.ts # API client
├── doctor.service.ts # Business logic
└── doctor.store.ts # Signal-based state
```
### 6.3 Dashboard Layout
```
+------------------------------------------------------------------+
| Doctor Diagnostics [Run Quick] [Run Full] |
+------------------------------------------------------------------+
| Filters: [Category v] [Plugin v] [Severity v] [Export Report] |
+------------------------------------------------------------------+
| |
| Summary Strip |
| +----------+ +----------+ +----------+ +----------+ +----------+ |
| | 44 | | 2 | | 1 | | 0 | | 8.3s | |
| | Passed | | Warnings | | Failed | | Skipped | | Duration | |
| +----------+ +----------+ +----------+ +----------+ +----------+ |
| |
+------------------------------------------------------------------+
| Check Results |
| +----------------------------------------------------------------+ |
| | [FAIL] check.database.migrations.pending [Expand] | |
| | 3 pending release migrations in schema 'auth' | |
| +----------------------------------------------------------------+ |
| | [WARN] check.tls.certificates.expiry [Expand] | |
| | TLS certificate expires in 14 days | |
| +----------------------------------------------------------------+ |
| | [PASS] check.database.connectivity [Expand] | |
| | PostgreSQL connection successful (12ms) | |
| +----------------------------------------------------------------+ |
| | ... more checks ... | |
+------------------------------------------------------------------+
```
### 6.4 Expanded Check View
```
+------------------------------------------------------------------+
| [FAIL] check.database.migrations.pending |
+------------------------------------------------------------------+
| Diagnosis |
| 3 pending release migrations detected in schema 'auth' |
+------------------------------------------------------------------+
| Evidence |
| +--------------------------------------------------------------+ |
| | Schema | auth | |
| | Current version | 099_add_dpop_thumbprints | |
| | Pending | 100_add_tenant_quotas | |
| | | 101_add_audit_retention | |
| | | 102_add_session_revocation | |
| | Connection | postgres://localhost:5432/stellaops | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Likely Causes |
| 1. Release migrations not applied before deployment |
| 2. Migration files added after last deployment |
+------------------------------------------------------------------+
| Fix Steps [Copy All] |
| +--------------------------------------------------------------+ |
| | Step 1: Backup database first (RECOMMENDED) [Copy] | |
| | pg_dump -h localhost -U stella_admin -d stellaops -F c \ | |
| | -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump | |
| +--------------------------------------------------------------+ |
| | Step 2: Apply pending release migrations [Copy] | |
| | stella system migrations-run --module Authority \ | |
| | --category release | |
| +--------------------------------------------------------------+ |
| | Step 3: Verify migrations applied [Copy] | |
| | stella system migrations-status --module Authority | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Verification [Copy] |
| stella doctor --check check.database.migrations.pending |
+------------------------------------------------------------------+
| [Re-run Check] [Mark Resolved] |
+------------------------------------------------------------------+
```
### 6.5 Pack Navigation and Fix Actions
- Navigation hierarchy: packs -> plugins -> checks.
- Each check shows status, evidence, Copy Fix Commands, and Run Fix (disabled unless `doctor.fix.enabled=true`).
- Export actions: Download JSON and Download DSSE summary.
### 6.6 Real-Time Updates
- **Polling:** Auto-refresh option (every 30s/60s/5m)
- **SSE:** Live check progress during execution
- **WebSocket:** Optional for high-frequency updates
---
## 7. API Surface
### 7.1 Endpoints
**Base Path:** `/api/v1/doctor`
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/checks` | List available checks with metadata |
| `GET` | `/plugins` | List available plugins |
| `POST` | `/run` | Execute doctor checks |
| `GET` | `/run/{runId}` | Get run status/results |
| `GET` | `/run/{runId}/stream` | SSE stream for live progress |
| `GET` | `/reports` | List historical reports |
| `GET` | `/reports/{reportId}` | Get specific report |
| `DELETE` | `/reports/{reportId}` | Delete report |
### 7.2 Request/Response Models
#### List Checks
```http
GET /api/v1/doctor/checks?category=database&tags=quick
```
```json
{
"checks": [
{
"checkId": "check.database.connectivity",
"name": "Database Connectivity",
"description": "Verify PostgreSQL connection",
"pluginId": "stellaops.doctor.database",
"category": "database",
"defaultSeverity": "fail",
"tags": ["quick", "database"],
"estimatedDurationMs": 500
}
],
"total": 47
}
```
#### Run Checks
```http
POST /api/v1/doctor/run
Content-Type: application/json
{
"mode": "quick",
"categories": ["database", "integration"],
"plugins": [],
"checkIds": [],
"timeoutMs": 30000,
"parallelism": 4,
"includeRemediation": true
}
```
```json
{
"runId": "dr_20260112_143052_abc123",
"status": "running",
"startedAt": "2026-01-12T14:30:52Z",
"checksTotal": 12,
"checksCompleted": 0
}
```
#### Get Run Results
```http
GET /api/v1/doctor/run/dr_20260112_143052_abc123
```
```json
{
"runId": "dr_20260112_143052_abc123",
"status": "completed",
"startedAt": "2026-01-12T14:30:52Z",
"completedAt": "2026-01-12T14:31:00Z",
"durationMs": 8300,
"summary": {
"passed": 44,
"warnings": 2,
"failed": 1,
"skipped": 0,
"total": 47
},
"overallSeverity": "fail",
"results": [
{
"checkId": "check.database.migrations.pending",
"pluginId": "stellaops.doctor.database",
"category": "database",
"severity": "fail",
"diagnosis": "3 pending release migrations detected in schema 'auth'",
"evidence": {
"description": "Migration state for auth schema",
"data": {
"schema": "auth",
"currentVersion": "099_add_dpop_thumbprints",
"pendingMigrations": "100_add_tenant_quotas, 101_add_audit_retention, 102_add_session_revocation",
"connection": "postgres://localhost:5432/stellaops"
}
},
"likelyCauses": [
"Release migrations not applied before deployment",
"Migration files added after last deployment"
],
"remediation": {
"requiresBackup": true,
"safetyNote": "Always backup before running migrations",
"steps": [
{
"order": 1,
"description": "Backup database first (RECOMMENDED)",
"command": "pg_dump -h localhost -U stella_admin -d stellaops -F c -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump",
"commandType": "shell",
"placeholders": {}
},
{
"order": 2,
"description": "Apply pending release migrations",
"command": "stella system migrations-run --module Authority --category release",
"commandType": "shell",
"placeholders": {}
},
{
"order": 3,
"description": "Verify migrations applied",
"command": "stella system migrations-status --module Authority",
"commandType": "shell",
"placeholders": {}
}
]
},
"verificationCommand": "stella doctor --check check.database.migrations.pending",
"durationMs": 234,
"executedAt": "2026-01-12T14:30:54Z"
}
]
}
```
Results also expose a `how_to_fix` object for automation. It is a simplified alias of
the richer `remediation` model and includes `commands[]` printed verbatim.
### 7.3 SSE Stream
```http
GET /api/v1/doctor/run/dr_20260112_143052_abc123/stream
Accept: text/event-stream
```
```
event: check-started
data: {"checkId":"check.database.connectivity","startedAt":"2026-01-12T14:30:52Z"}
event: check-completed
data: {"checkId":"check.database.connectivity","severity":"pass","durationMs":45}
event: check-started
data: {"checkId":"check.database.migrations.pending","startedAt":"2026-01-12T14:30:52Z"}
event: check-completed
data: {"checkId":"check.database.migrations.pending","severity":"fail","durationMs":234}
event: run-completed
data: {"runId":"dr_20260112_143052_abc123","summary":{"passed":44,"warnings":2,"failed":1}}
```
### 7.4 Evidence Logs and Attestations
Doctor runs emit a JSONL evidence log and optional DSSE summary for audit trails.
By default, JSONL is local only and deterministic; outbound telemetry is opt-in.
- JSONL path: `artifacts/doctor/doctor-run-<runId>.ndjson` (configurable).
- DSSE summary: `artifacts/doctor/doctor-run-<runId>.dsse.json` (optional).
- Evidence records include `doctor_command` to capture the operator-invoked command.
DSSE summaries assume operator execution and must include the same command note.
Example JSONL line:
```json
{"runId":"dr_20260112_143052_abc123","doctor_command":"stella doctor run --format json","checkId":"check.database.connectivity","severity":"pass","executedAt":"2026-01-12T14:30:52Z","how_to_fix":{"commands":[]}}
```
---
## 8. Remediation Command Patterns
Remediation should favor the best operator experience: short, copy/paste friendly
commands with minimal steps and clear verification guidance.
### 8.1 Standard Output Format
Every failed check produces remediation in this structure:
```
[{SEVERITY}] {check.id}
Diagnosis: {one-line summary}
Evidence:
{key}: {value}
{key}: {value}
...
Likely Causes:
1. {most likely cause}
2. {second most likely cause}
...
Fix Steps:
# {step number}. {description}
{command}
# {step number}. {description}
{command}
...
Verification:
{command to re-run this specific check}
```
### 8.1.1 JSON Remediation Structure
The JSON output MUST include a `how_to_fix` object for agent consumption. It can be
derived from `remediation.steps` and preserves command order.
```json
"how_to_fix": {
"summary": "Apply baseline branch policy",
"commands": [
"stella orchestrator scm apply-branch-policy --preset strict"
]
}
```
### 8.2 Placeholder Conventions
When commands require user-specific values:
| Placeholder | Meaning | Example |
|-------------|---------|---------|
| `{HOSTNAME}` | Target hostname | `ldap.example.com` |
| `{PORT}` | Port number | `636` |
| `{USERNAME}` | Username | `admin` |
| `{PASSWORD}` | Password (never shown) | `***` |
| `{DATABASE}` | Database name | `stellaops` |
| `{SCHEMA}` | Schema name | `auth` |
| `{FILE_PATH}` | File path | `/etc/ssl/certs/ca.crt` |
| `{TOKEN}` | API token (never shown) | `***` |
| `{URL}` | Full URL | `https://api.github.com` |
### 8.3 Safety Notes
Doctor fix executes only non-destructive commands. If a fix requires a change
that modifies data, Doctor must present it as manual guidance with explicit
safety notes and never execute it.
```
Manual Steps (not executed by Doctor):
# SAFETY: This operation modifies the database. Create a backup first.
# 1. Backup database (REQUIRED before proceeding)
pg_dump -h {HOSTNAME} -U {USERNAME} -d {DATABASE} -F c \
-f backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Apply the fix
stella system migrations-run --module Authority --category release
```
### 8.4 Multi-Platform Commands
Where applicable, provide commands for different platforms:
```
Fix Steps:
# 1. Restart the service
# Linux (systemd):
sudo systemctl restart stellaops-gateway
# Linux (Docker):
docker restart stellaops-gateway
# Docker Compose:
docker compose restart gateway
# Kubernetes:
kubectl rollout restart deployment/stellaops-gateway -n stellaops
```
---
## 9. Doctor Check Catalog
This section documents all diagnostic checks organized by plugin/category.
### 9.1 Core Platform Plugin (`stellaops.doctor.core`)
#### check.config.required
| Property | Value |
|----------|-------|
| **CheckId** | `check.config.required` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Fail |
| **Tags** | `quick`, `config`, `startup` |
| **What it verifies** | All required configuration values are present |
| **Evidence collected** | Missing keys, config sources checked, environment |
| **Failure modes** | Missing `STELLAOPS_BACKEND_URL`, missing database connection string, missing Authority URL |
**Remediation:**
```bash
# 1. Check which configuration values are missing
stella config list --show-missing
# 2. Set missing environment variables
export STELLAOPS_BACKEND_URL="https://api.stellaops.example.com"
export STELLAOPS_POSTGRES_CONNECTION="Host=localhost;Database=stellaops;Username=stella_app;Password={PASSWORD}"
export STELLAOPS_AUTHORITY_URL="https://auth.stellaops.example.com"
# 3. Or update configuration file
# Edit: /etc/stellaops/stellaops.yaml
```
**Verification:** `stella doctor --check check.config.required`
---
#### check.config.syntax
| Property | Value |
|----------|-------|
| **CheckId** | `check.config.syntax` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Fail |
| **Tags** | `quick`, `config` |
| **What it verifies** | Configuration files have valid YAML/JSON syntax |
| **Evidence collected** | File path, line number, parse error message |
| **Failure modes** | Invalid YAML indentation, JSON syntax error, encoding issues |
**Remediation:**
```bash
# 1. Validate YAML syntax
yamllint /etc/stellaops/stellaops.yaml
# 2. Check for encoding issues (should be UTF-8)
file /etc/stellaops/stellaops.yaml
# 3. Fix common YAML issues
# - Use spaces, not tabs
# - Check string quoting
# - Verify indentation (2 spaces per level)
```
**Verification:** `stella doctor --check check.config.syntax`
---
#### check.config.deprecated
| Property | Value |
|----------|-------|
| **CheckId** | `check.config.deprecated` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Warn |
| **Tags** | `config` |
| **What it verifies** | No deprecated configuration keys are in use |
| **Evidence collected** | Deprecated keys found, replacement keys |
| **Failure modes** | Using old key names, removed options |
**Remediation:**
```bash
# 1. Review deprecated keys and their replacements
stella config migrate --dry-run
# 2. Update configuration file with new key names
stella config migrate --apply
# 3. Verify configuration after migration
stella config validate
```
**Verification:** `stella doctor --check check.config.deprecated`
---
#### check.runtime.dotnet
| Property | Value |
|----------|-------|
| **CheckId** | `check.runtime.dotnet` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Fail |
| **Tags** | `quick`, `runtime` |
| **What it verifies** | .NET runtime version meets minimum requirements |
| **Evidence collected** | Installed version, required version, runtime path |
| **Failure modes** | Outdated .NET version, missing runtime |
**Remediation:**
```bash
# 1. Check current .NET version
dotnet --version
# 2. Install required .NET version (Ubuntu/Debian)
wget https://dot.net/v1/dotnet-install.sh
chmod +x dotnet-install.sh
./dotnet-install.sh --channel 10.0
# 3. Verify installation
dotnet --list-runtimes
```
**Verification:** `stella doctor --check check.runtime.dotnet`
---
#### check.runtime.memory
| Property | Value |
|----------|-------|
| **CheckId** | `check.runtime.memory` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Warn |
| **Tags** | `runtime`, `resources` |
| **What it verifies** | Sufficient memory available for operation |
| **Evidence collected** | Total memory, available memory, GC memory info |
| **Failure modes** | Low available memory (<1GB), high GC pressure |
**Remediation:**
```bash
# 1. Check current memory usage
free -h
# 2. Identify memory-heavy processes
ps aux --sort=-%mem | head -20
# 3. Adjust container memory limits if applicable
# Docker:
docker update --memory 4g stellaops-gateway
# Kubernetes:
kubectl patch deployment stellaops-gateway -p '{"spec":{"template":{"spec":{"containers":[{"name":"gateway","resources":{"limits":{"memory":"4Gi"}}}]}}}}'
```
**Verification:** `stella doctor --check check.runtime.memory`
---
#### check.runtime.disk.space
| Property | Value |
|----------|-------|
| **CheckId** | `check.runtime.disk.space` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Warn |
| **Tags** | `runtime`, `resources` |
| **What it verifies** | Sufficient disk space on required paths |
| **Evidence collected** | Path, total space, available space, usage percentage |
| **Failure modes** | Data directory >90% full, log directory full |
**Remediation:**
```bash
# 1. Check disk usage
df -h /var/lib/stellaops
# 2. Find large files
du -sh /var/lib/stellaops/* | sort -hr | head -20
# 3. Clean up old logs
find /var/log/stellaops -name "*.log" -mtime +30 -delete
# 4. Clean up old exports
stella export cleanup --older-than 30d
```
**Verification:** `stella doctor --check check.runtime.disk.space`
---
#### check.runtime.disk.permissions
| Property | Value |
|----------|-------|
| **CheckId** | `check.runtime.disk.permissions` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Fail |
| **Tags** | `quick`, `runtime`, `security` |
| **What it verifies** | Write permissions on required directories |
| **Evidence collected** | Path, expected permissions, actual permissions, owner |
| **Failure modes** | Cannot write to data directory, log directory not writable |
**Remediation:**
```bash
# 1. Check current permissions
ls -la /var/lib/stellaops
# 2. Fix ownership
sudo chown -R stellaops:stellaops /var/lib/stellaops
# 3. Fix permissions
sudo chmod 755 /var/lib/stellaops
sudo chmod 755 /var/log/stellaops
# 4. Verify write access
sudo -u stellaops touch /var/lib/stellaops/.write-test && rm /var/lib/stellaops/.write-test
```
**Verification:** `stella doctor --check check.runtime.disk.permissions`
---
#### check.time.sync
| Property | Value |
|----------|-------|
| **CheckId** | `check.time.sync` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Warn |
| **Tags** | `quick`, `runtime` |
| **What it verifies** | System clock is synchronized (NTP) |
| **Evidence collected** | NTP status, clock offset, sync source |
| **Failure modes** | Clock drift >5s, NTP not running, no sync source |
**Remediation:**
```bash
# 1. Check NTP status
timedatectl status
# 2. Enable NTP synchronization
sudo timedatectl set-ntp true
# 3. Force immediate sync
sudo systemctl restart systemd-timesyncd
# 4. Verify sync status
timedatectl timesync-status
```
**Verification:** `stella doctor --check check.time.sync`
---
#### check.crypto.profiles
| Property | Value |
|----------|-------|
| **CheckId** | `check.crypto.profiles` |
| **Plugin** | `stellaops.doctor.core` |
| **Category** | Core |
| **Severity** | Fail |
| **Tags** | `quick`, `security`, `crypto` |
| **What it verifies** | Crypto profile is valid and providers are available |
| **Evidence collected** | Active profile, available providers, missing providers |
| **Failure modes** | Invalid profile, required provider not available |
**Remediation:**
```bash
# 1. List available crypto profiles
stella crypto profiles list
# 2. Validate current profile
stella crypto profiles validate
# 3. Switch to a different profile if needed
stella crypto profiles set --profile default
# 4. Install missing providers (if GOST required)
# See docs/crypto/gost-setup.md
```
**Verification:** `stella doctor --check check.crypto.profiles`
---
### 9.2 Database Plugin (`stellaops.doctor.database`)
#### check.database.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.connectivity` |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Fail |
| **Tags** | `quick`, `database` |
| **What it verifies** | PostgreSQL connection is successful |
| **Evidence collected** | Connection string (redacted), latency, server version |
| **Failure modes** | Connection refused, authentication failed, timeout |
**Remediation:**
```bash
# 1. Test connection manually
psql "host=localhost dbname=stellaops user=stella_app" -c "SELECT 1"
# 2. Check PostgreSQL is running
sudo systemctl status postgresql
# 3. Check connection settings
# Verify pg_hba.conf allows connections
sudo cat /etc/postgresql/16/main/pg_hba.conf | grep stellaops
# 4. Check firewall
sudo ufw status | grep 5432
```
**Verification:** `stella doctor --check check.database.connectivity`
---
#### check.database.version
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.version` |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Warn |
| **Tags** | `database` |
| **What it verifies** | PostgreSQL version meets minimum requirements (>=16) |
| **Evidence collected** | Current version, required version |
| **Failure modes** | PostgreSQL <16, unsupported version |
**Remediation:**
```bash
# 1. Check current version
psql -c "SELECT version();"
# 2. Upgrade PostgreSQL (Ubuntu)
sudo apt install postgresql-16
# 3. Migrate data to new version
sudo pg_upgradecluster 14 main
# 4. Remove old version
sudo apt remove postgresql-14
```
**Verification:** `stella doctor --check check.database.version`
---
#### check.database.migrations.pending
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.migrations.pending` |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Fail |
| **Tags** | `database`, `migrations` |
| **What it verifies** | No pending release migrations exist |
| **Evidence collected** | Schema, current version, pending migrations list |
| **Failure modes** | Release migrations not applied before deployment |
**Remediation:**
```bash
# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Check migration status for all modules
stella system migrations-status
# 3. Apply pending release migrations
stella system migrations-run --category release
# 4. Verify all migrations applied
stella system migrations-status --verify
```
**Verification:** `stella doctor --check check.database.migrations.pending`
---
#### check.database.migrations.checksum
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.migrations.checksum` |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Fail |
| **Tags** | `database`, `migrations`, `security` |
| **What it verifies** | Applied migration checksums match source files |
| **Evidence collected** | Mismatched migrations, expected vs actual checksum |
| **Failure modes** | Migration file modified after application, corruption |
**Remediation:**
```bash
# CRITICAL: Checksum mismatch indicates potential data integrity issue
# 1. Identify mismatched migrations
stella system migrations-verify --detailed
# 2. If migrations were legitimately modified (rare):
# WARNING: Only proceed if you understand the implications
stella system migrations-repair --migration {MIGRATION_NAME} --force
# 3. If data corruption suspected:
# Restore from backup and reapply migrations
pg_restore -h localhost -U stella_admin -d stellaops stellaops_backup.dump
stella system migrations-run --all
```
**Verification:** `stella doctor --check check.database.migrations.checksum`
---
#### check.database.migrations.lock
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.migrations.lock` |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Warn |
| **Tags** | `database`, `migrations` |
| **What it verifies** | No stale migration locks exist |
| **Evidence collected** | Lock holder, lock duration, schema |
| **Failure modes** | Abandoned lock from crashed process |
**Remediation:**
```bash
# 1. Check for active locks
psql -d stellaops -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';"
# 2. Identify lock holder process
psql -d stellaops -c "SELECT pid, query, state FROM pg_stat_activity WHERE pid IN (SELECT pid FROM pg_locks WHERE locktype = 'advisory');"
# 3. If process is dead, clear the lock
# WARNING: Only if you are certain no migration is running
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"
# 4. Retry migration
stella system migrations-run --category release
```
**Verification:** `stella doctor --check check.database.migrations.lock`
---
#### check.database.schema.{schema}
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.schema.{schema}` (e.g., `check.database.schema.auth`) |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Fail |
| **Tags** | `database` |
| **What it verifies** | Schema exists and has expected tables |
| **Evidence collected** | Schema name, expected tables, missing tables |
| **Failure modes** | Schema not created, tables dropped |
**Remediation:**
```bash
# 1. Check if schema exists
psql -d stellaops -c "SELECT schema_name FROM information_schema.schemata WHERE schema_name = '{SCHEMA}';"
# 2. If schema missing, run startup migrations
stella system migrations-run --module {MODULE} --category startup
# 3. Verify schema tables
psql -d stellaops -c "SELECT table_name FROM information_schema.tables WHERE table_schema = '{SCHEMA}';"
```
**Verification:** `stella doctor --check check.database.schema.{schema}`
---
#### check.database.connections.pool
| Property | Value |
|----------|-------|
| **CheckId** | `check.database.connections.pool` |
| **Plugin** | `stellaops.doctor.database` |
| **Category** | Database |
| **Severity** | Warn |
| **Tags** | `database`, `performance` |
| **What it verifies** | Connection pool is healthy, not exhausted |
| **Evidence collected** | Active connections, idle connections, max connections |
| **Failure modes** | Pool exhausted, connection leak |
**Remediation:**
```bash
# 1. Check current connections
psql -d stellaops -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'stellaops';"
# 2. Check max connections
psql -d stellaops -c "SHOW max_connections;"
# 3. Identify long-running queries
psql -d stellaops -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10;"
# 4. Increase max connections if needed
# Edit postgresql.conf: max_connections = 200
sudo systemctl reload postgresql
```
**Verification:** `stella doctor --check check.database.connections.pool`
---
### 9.3 Service Graph Plugin (`stellaops.doctor.servicegraph`)
#### check.services.gateway.running
| Property | Value |
|----------|-------|
| **CheckId** | `check.services.gateway.running` |
| **Plugin** | `stellaops.doctor.servicegraph` |
| **Category** | ServiceGraph |
| **Severity** | Fail |
| **Tags** | `quick`, `services` |
| **What it verifies** | Gateway service is running and accepting connections |
| **Evidence collected** | Service status, PID, uptime, port binding |
| **Failure modes** | Service not running, port already in use |
**Remediation:**
```bash
# 1. Check service status
sudo systemctl status stellaops-gateway
# 2. Check logs for errors
sudo journalctl -u stellaops-gateway -n 50
# 3. Check port binding
sudo ss -tlnp | grep 443
# 4. Start/restart service
sudo systemctl restart stellaops-gateway
```
**Verification:** `stella doctor --check check.services.gateway.running`
---
#### check.services.gateway.routing
| Property | Value |
|----------|-------|
| **CheckId** | `check.services.gateway.routing` |
| **Plugin** | `stellaops.doctor.servicegraph` |
| **Category** | ServiceGraph |
| **Severity** | Fail |
| **Tags** | `services`, `routing` |
| **What it verifies** | Gateway can route requests to backend services |
| **Evidence collected** | Registered services, routing table, disconnected services |
| **Failure modes** | No services registered, all services disconnected |
**Remediation:**
```bash
# 1. Check registered services
curl -s http://localhost:8080/health/routing | jq
# 2. Verify backend services are running
stella services status
# 3. Check Router transport connectivity
stella services connectivity-test
# 4. Restart disconnected services
sudo systemctl restart stellaops-concelier
sudo systemctl restart stellaops-scanner
```
**Verification:** `stella doctor --check check.services.gateway.routing`
---
#### check.services.{service}.health
| Property | Value |
|----------|-------|
| **CheckId** | `check.services.{service}.health` (e.g., `check.services.concelier.health`) |
| **Plugin** | `stellaops.doctor.servicegraph` |
| **Category** | ServiceGraph |
| **Severity** | Fail |
| **Tags** | `services` |
| **What it verifies** | Service health endpoint returns healthy |
| **Evidence collected** | Health status, dependencies, latency |
| **Failure modes** | Service unhealthy, degraded dependencies |
**Remediation:**
```bash
# 1. Check service health directly
curl -s http://localhost:{PORT}/healthz | jq
# 2. Check detailed health
curl -s http://localhost:{PORT}/health/details | jq
# 3. Check service logs
sudo journalctl -u stellaops-{SERVICE} -n 100
# 4. Restart service if needed
sudo systemctl restart stellaops-{SERVICE}
```
**Verification:** `stella doctor --check check.services.{service}.health`
---
#### check.services.{service}.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.services.{service}.connectivity` |
| **Plugin** | `stellaops.doctor.servicegraph` |
| **Category** | ServiceGraph |
| **Severity** | Fail |
| **Tags** | `services`, `routing` |
| **What it verifies** | Service is reachable from Gateway via Router |
| **Evidence collected** | Transport type, connection state, last heartbeat |
| **Failure modes** | Connection refused, heartbeat timeout |
**Remediation:**
```bash
# 1. Check Router connection status
stella services connection-status --service {SERVICE}
# 2. Test network connectivity
nc -zv {SERVICE_HOST} {SERVICE_PORT}
# 3. Check firewall rules
sudo ufw status | grep {SERVICE_PORT}
# 4. Verify Router configuration in service
# Check stellaops.yaml for correct Router endpoints
```
**Verification:** `stella doctor --check check.services.{service}.connectivity`
---
#### check.services.authority.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.services.authority.connectivity` |
| **Plugin** | `stellaops.doctor.servicegraph` |
| **Category** | ServiceGraph |
| **Severity** | Fail |
| **Tags** | `quick`, `services`, `auth` |
| **What it verifies** | Authority service is reachable |
| **Evidence collected** | Authority URL, response status, latency |
| **Failure modes** | Authority unreachable, OIDC discovery failed |
**Remediation:**
```bash
# 1. Check Authority URL configuration
echo $STELLAOPS_AUTHORITY_URL
# 2. Test OIDC discovery endpoint
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq
# 3. Check Authority service status
sudo systemctl status stellaops-authority
# 4. Verify network connectivity
curl -v ${STELLAOPS_AUTHORITY_URL}/healthz
```
**Verification:** `stella doctor --check check.services.authority.connectivity`
---
### 9.4 Security Plugin (`stellaops.doctor.security`)
#### check.auth.oidc.discovery
| Property | Value |
|----------|-------|
| **CheckId** | `check.auth.oidc.discovery` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `quick`, `auth`, `security` |
| **What it verifies** | OIDC well-known endpoint is accessible |
| **Evidence collected** | Discovery URL, issuer, supported flows |
| **Failure modes** | Discovery endpoint unavailable, invalid response |
**Remediation:**
```bash
# 1. Test discovery endpoint
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq
# 2. Verify issuer matches configuration
# The issuer in the response should match STELLAOPS_AUTHORITY_URL
# 3. Check Authority service logs
sudo journalctl -u stellaops-authority -n 50
# 4. Verify TLS certificate
openssl s_client -connect auth.stellaops.example.com:443 -servername auth.stellaops.example.com
```
**Verification:** `stella doctor --check check.auth.oidc.discovery`
---
#### check.auth.oidc.jwks
| Property | Value |
|----------|-------|
| **CheckId** | `check.auth.oidc.jwks` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `auth`, `security` |
| **What it verifies** | JWKS endpoint returns valid signing keys |
| **Evidence collected** | JWKS URL, key count, key algorithms |
| **Failure modes** | JWKS unavailable, no keys, unsupported algorithms |
**Remediation:**
```bash
# 1. Fetch JWKS directly
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/jwks.json | jq
# 2. Verify keys are present
# Response should contain at least one key in "keys" array
# 3. If JWKS is empty, regenerate signing keys
stella authority keys rotate
# 4. Restart Authority service
sudo systemctl restart stellaops-authority
```
**Verification:** `stella doctor --check check.auth.oidc.jwks`
---
#### check.auth.ldap.bind
| Property | Value |
|----------|-------|
| **CheckId** | `check.auth.ldap.bind` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `auth`, `security`, `ldap` |
| **What it verifies** | LDAP bind credentials are valid |
| **Evidence collected** | LDAP host, bind DN (redacted), TLS status |
| **Failure modes** | Invalid credentials, connection refused, TLS failure |
**Remediation:**
```bash
# 1. Test LDAP connection with ldapsearch
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "cn=bind-user,ou=service,dc=example,dc=internal" \
-w "{PASSWORD}" \
-b "ou=people,dc=example,dc=internal" "(uid=*)" dn | head -10
# 2. Check TLS certificate
openssl s_client -connect {LDAP_HOST}:636 -showcerts
# 3. Verify bind DN and password in configuration
# Check etc/authority.plugins/ldap.yaml
# 4. Test with Authority's ldap-test command
stella authority ldap-test --bind-only
```
**Verification:** `stella doctor --check check.auth.ldap.bind`
---
#### check.auth.ldap.search
| Property | Value |
|----------|-------|
| **CheckId** | `check.auth.ldap.search` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `auth`, `ldap` |
| **What it verifies** | LDAP search base is accessible and returns users |
| **Evidence collected** | Search base, user count, search time |
| **Failure modes** | Search base not found, no users returned, timeout |
**Remediation:**
```bash
# 1. Test LDAP search
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "{BIND_DN}" -w "{PASSWORD}" \
-b "{SEARCH_BASE}" "(objectClass=person)" dn | wc -l
# 2. Verify search base in configuration
# Check etc/authority.plugins/ldap.yaml: connection.searchBase
# 3. Check if search base exists
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "{BIND_DN}" -w "{PASSWORD}" \
-b "" -s base "(objectClass=*)"
# 4. Verify bind user has read permissions
# Check LDAP ACLs
```
**Verification:** `stella doctor --check check.auth.ldap.search`
---
#### check.auth.ldap.groups
| Property | Value |
|----------|-------|
| **CheckId** | `check.auth.ldap.groups` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Warn |
| **Tags** | `auth`, `ldap` |
| **What it verifies** | LDAP group mapping is configured and working |
| **Evidence collected** | Group attribute, mapped groups, sample user groups |
| **Failure modes** | Group attribute not found, no groups mapped |
**Remediation:**
```bash
# 1. Check group attribute configuration
# etc/authority.plugins/ldap.yaml: claims.groupAttribute
# 2. Test group lookup for a sample user
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "{BIND_DN}" -w "{PASSWORD}" \
-b "{SEARCH_BASE}" "(uid={TEST_USER})" memberOf
# 3. Verify group mapping in Authority
stella authority ldap-test --user {TEST_USER} --show-groups
# 4. Update group attribute if needed
# Common attributes: memberOf, member, groupMembership
```
**Verification:** `stella doctor --check check.auth.ldap.groups`
---
#### check.tls.certificates.expiry
| Property | Value |
|----------|-------|
| **CheckId** | `check.tls.certificates.expiry` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Warn (30d), Fail (7d) |
| **Tags** | `quick`, `security`, `tls` |
| **What it verifies** | TLS certificates are not expiring soon |
| **Evidence collected** | Certificate path, subject, expiry date, days remaining |
| **Failure modes** | Certificate expired, expiring within threshold |
**Remediation:**
```bash
# 1. Check certificate expiry
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -enddate
# 2. Renew with certbot (if using Let's Encrypt)
sudo certbot renew --cert-name stellaops.example.com
# 3. Renew manually (if self-signed or enterprise CA)
# Generate new CSR
openssl req -new -key /etc/ssl/private/stellaops.key \
-out /tmp/stellaops.csr -subj "/CN=stellaops.example.com"
# Submit CSR to CA and install new certificate
# 4. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway
```
**Verification:** `stella doctor --check check.tls.certificates.expiry`
---
#### check.tls.certificates.chain
| Property | Value |
|----------|-------|
| **CheckId** | `check.tls.certificates.chain` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `security`, `tls` |
| **What it verifies** | TLS certificate chain is complete and valid |
| **Evidence collected** | Certificate chain, validation errors |
| **Failure modes** | Missing intermediate, self-signed not trusted, chain broken |
**Remediation:**
```bash
# 1. Verify certificate chain
openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt \
/etc/ssl/certs/stellaops.crt
# 2. Check chain with openssl
openssl s_client -connect stellaops.example.com:443 \
-servername stellaops.example.com -showcerts
# 3. Download missing intermediate certificates
# From your CA's website
# 4. Concatenate certificates in correct order
cat stellaops.crt intermediate.crt > stellaops-fullchain.crt
```
**Verification:** `stella doctor --check check.tls.certificates.chain`
---
#### check.secrets.vault.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.secrets.vault.connectivity` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `security`, `vault` |
| **What it verifies** | Vault service is reachable |
| **Evidence collected** | Vault address, seal status, version |
| **Failure modes** | Vault unreachable, sealed, version mismatch |
**Remediation:**
```bash
# 1. Check Vault status
vault status
# 2. If sealed, unseal Vault
vault operator unseal {UNSEAL_KEY_1}
vault operator unseal {UNSEAL_KEY_2}
vault operator unseal {UNSEAL_KEY_3}
# 3. Check network connectivity
curl -s ${VAULT_ADDR}/v1/sys/health | jq
# 4. Verify VAULT_ADDR environment variable
echo $VAULT_ADDR
```
**Verification:** `stella doctor --check check.secrets.vault.connectivity`
---
#### check.secrets.vault.auth
| Property | Value |
|----------|-------|
| **CheckId** | `check.secrets.vault.auth` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `security`, `vault` |
| **What it verifies** | Vault authentication is successful |
| **Evidence collected** | Auth method, token TTL, policies |
| **Failure modes** | Invalid token, expired token, wrong auth method |
**Remediation:**
```bash
# 1. Check current token
vault token lookup
# 2. If token expired, authenticate again
# Token auth:
vault login {TOKEN}
# AppRole auth:
vault write auth/approle/login role_id={ROLE_ID} secret_id={SECRET_ID}
# Kubernetes auth:
vault write auth/kubernetes/login role=stellaops jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token
# 3. Verify authentication worked
vault token lookup
```
**Verification:** `stella doctor --check check.secrets.vault.auth`
---
#### check.secrets.vault.paths
| Property | Value |
|----------|-------|
| **CheckId** | `check.secrets.vault.paths` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `security`, `vault` |
| **What it verifies** | Required secret paths are accessible |
| **Evidence collected** | Checked paths, accessible paths, denied paths |
| **Failure modes** | Permission denied, path not found |
**Remediation:**
```bash
# 1. Test reading required secrets
vault kv get secret/data/stellaops/api-keys
# 2. Check policy permissions
vault token lookup -format=json | jq '.data.policies'
# 3. Review policy rules
vault policy read stellaops
# 4. Update policy if needed
vault policy write stellaops - <<EOF
path "secret/data/stellaops/*" {
capabilities = ["read", "list"]
}
EOF
```
**Verification:** `stella doctor --check check.secrets.vault.paths`
---
#### check.security.evidence.integrity
| Property | Value |
|----------|-------|
| **CheckId** | `check.security.evidence.integrity` |
| **Plugin** | `stellaops.doctor.security` |
| **Category** | Security |
| **Severity** | Fail |
| **Tags** | `security`, `evidence`, `integrity`, `dsse`, `rekor`, `offline` |
| **What it verifies** | Evidence files have valid DSSE signatures, Rekor inclusion proofs, and consistent hashes |
| **Evidence collected** | Evidence locker path, total files, valid/invalid/skipped counts, specific issues |
| **Failure modes** | Empty DSSE payload, missing signatures, invalid base64, missing Rekor UUID, missing inclusion proof hashes, digest mismatch |
**What it checks:**
1. **DSSE Envelope Structure**: Validates `payloadType`, `payload` (base64), and `signatures` array
2. **Signature Completeness**: Each signature has `keyid` and valid base64 `sig`
3. **Payload Digest Consistency**: If `payloadDigest` field present, recomputes and compares SHA-256
4. **Evidence Bundle Structure**: Validates `bundleId`, `manifest.version`, and optional `contentDigest`
5. **Rekor Receipt Validity**: If present, validates `uuid`, `logIndex`, and `inclusionProof.hashes`
**Remediation:**
```bash
# 1. List evidence files with issues
stella doctor --check check.security.evidence.integrity --output json \
| jq '.evidence.issues[]'
# 2. Re-sign affected evidence bundles
stella evidence resign --bundle-id {BUNDLE_ID}
# 3. Verify Rekor inclusion manually (if online)
rekor-cli get --uuid {REKOR_UUID} --format json | jq
# 4. For offline environments, verify against local ledger
stella evidence verify --offline --bundle-id {BUNDLE_ID}
# 5. Re-generate evidence pack from source
stella export evidence-pack --artifact {ARTIFACT_DIGEST} --force
```
**Configuration:**
```yaml
# etc/appsettings.yaml
EvidenceLocker:
LocalPath: /var/lib/stellaops/evidence
# Or use Evidence:BasePath for alternate key
```
**Verification:** `stella doctor --check check.security.evidence.integrity`
---
### 9.5 Integration Plugins - SCM (`stellaops.doctor.integration.scm.*`)
#### check.integration.scm.github.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.scm.github.connectivity` |
| **Plugin** | `stellaops.doctor.integration.scm.github` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `scm`, `github` |
| **What it verifies** | GitHub API is reachable |
| **Evidence collected** | API endpoint, response status, latency |
| **Failure modes** | API unreachable, DNS resolution failed, TLS error |
**Remediation:**
```bash
# 1. Test GitHub API connectivity
curl -s https://api.github.com/zen
# 2. Check DNS resolution
nslookup api.github.com
# 3. Test with authentication
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user
# 4. Check proxy settings if behind firewall
echo $HTTPS_PROXY
```
**Verification:** `stella doctor --check check.integration.scm.github.connectivity`
---
#### check.integration.scm.github.auth
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.scm.github.auth` |
| **Plugin** | `stellaops.doctor.integration.scm.github` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `scm`, `github`, `auth` |
| **What it verifies** | GitHub authentication is successful |
| **Evidence collected** | Auth type (PAT/App/OAuth), user/app info |
| **Failure modes** | Invalid token, expired token, wrong app credentials |
**Remediation:**
```bash
# For Personal Access Token:
# 1. Verify token is valid
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | jq '.login'
# 2. Generate new token if expired
# Visit: https://github.com/settings/tokens
# For GitHub App:
# 1. Check app installation
curl -s -H "Authorization: Bearer {JWT}" \
-H "Accept: application/vnd.github+json" \
https://api.github.com/app
# 2. Verify app is installed on repository
curl -s -H "Authorization: Bearer {INSTALLATION_TOKEN}" \
https://api.github.com/installation/repositories
```
**Verification:** `stella doctor --check check.integration.scm.github.auth`
---
#### check.integration.scm.github.permissions
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.scm.github.permissions` |
| **Plugin** | `stellaops.doctor.integration.scm.github` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `scm`, `github` |
| **What it verifies** | Token/App has required scopes/permissions |
| **Evidence collected** | Current scopes, required scopes, missing scopes |
| **Failure modes** | Missing `repo` scope, missing `write:packages` |
**Remediation:**
```bash
# 1. Check current token scopes
curl -sI -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | grep x-oauth-scopes
# Required scopes for Stella Ops:
# - repo (full repository access)
# - read:org (organization membership)
# - write:packages (container registry)
# 2. Generate new token with correct scopes
# Visit: https://github.com/settings/tokens/new
# Select: repo, read:org, write:packages
# 3. Update token in Stella Ops
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}
```
**Verification:** `stella doctor --check check.integration.scm.github.permissions`
---
#### check.integration.scm.github.ratelimit
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.scm.github.ratelimit` |
| **Plugin** | `stellaops.doctor.integration.scm.github` |
| **Category** | Integration |
| **Severity** | Warn |
| **Tags** | `integration`, `scm`, `github` |
| **What it verifies** | GitHub API rate limit is not exhausted |
| **Evidence collected** | Limit, remaining, reset time |
| **Failure modes** | Rate limit exhausted, near threshold |
**Remediation:**
```bash
# 1. Check current rate limit status
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit | jq
# 2. If exhausted, wait for reset
# The "reset" field shows Unix timestamp when limit resets
# 3. Consider using GitHub App instead of PAT for higher limits
# PAT: 5000 requests/hour
# GitHub App: 15000 requests/hour per installation
# 4. Implement request caching in your application
```
**Verification:** `stella doctor --check check.integration.scm.github.ratelimit`
---
#### check.integration.scm.gitlab.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.scm.gitlab.connectivity` |
| **Plugin** | `stellaops.doctor.integration.scm.gitlab` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `scm`, `gitlab` |
| **What it verifies** | GitLab API is reachable |
| **Evidence collected** | API endpoint, response status, version |
| **Failure modes** | API unreachable, self-hosted instance down |
**Remediation:**
```bash
# 1. Test GitLab API connectivity
curl -s https://{GITLAB_HOST}/api/v4/version
# 2. For self-hosted GitLab, check service status
sudo gitlab-ctl status
# 3. Check firewall/proxy
curl -v https://{GITLAB_HOST}/api/v4/version
# 4. Verify URL configuration
stella integrations show --id {INTEGRATION_ID}
```
**Verification:** `stella doctor --check check.integration.scm.gitlab.connectivity`
---
#### check.integration.scm.gitlab.auth
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.scm.gitlab.auth` |
| **Plugin** | `stellaops.doctor.integration.scm.gitlab` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `scm`, `gitlab`, `auth` |
| **What it verifies** | GitLab authentication is successful |
| **Evidence collected** | Auth type, user info, token expiry |
| **Failure modes** | Invalid token, expired token, revoked access |
**Remediation:**
```bash
# 1. Test token authentication
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/user | jq '.username'
# 2. Check token expiry
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/personal_access_tokens/self | jq '.expires_at'
# 3. Generate new token if expired
# Visit: https://{GITLAB_HOST}/-/profile/personal_access_tokens
# 4. Update token in Stella Ops
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}
```
**Verification:** `stella doctor --check check.integration.scm.gitlab.auth`
---
### 9.6 Integration Plugins - Registry (`stellaops.doctor.integration.registry.*`)
#### check.integration.registry.harbor.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.registry.harbor.connectivity` |
| **Plugin** | `stellaops.doctor.integration.registry.harbor` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `registry`, `harbor` |
| **What it verifies** | Harbor registry is reachable |
| **Evidence collected** | Registry URL, health status, version |
| **Failure modes** | Registry unreachable, components unhealthy |
**Remediation:**
```bash
# 1. Check Harbor health endpoint
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq
# 2. Check individual components
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq '.components'
# 3. For self-hosted Harbor, check services
docker compose -f /opt/harbor/docker-compose.yml ps
# 4. Check Harbor logs
docker compose -f /opt/harbor/docker-compose.yml logs --tail=50 core
```
**Verification:** `stella doctor --check check.integration.registry.harbor.connectivity`
---
#### check.integration.registry.harbor.auth
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.registry.harbor.auth` |
| **Plugin** | `stellaops.doctor.integration.registry.harbor` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `registry`, `harbor`, `auth` |
| **What it verifies** | Harbor authentication is successful |
| **Evidence collected** | Auth type, user info, project access |
| **Failure modes** | Invalid credentials, LDAP sync issue |
**Remediation:**
```bash
# 1. Test Docker login
docker login {HARBOR_HOST} -u {USERNAME} -p {PASSWORD}
# 2. Test API authentication
curl -s -u {USERNAME}:{PASSWORD} https://{HARBOR_HOST}/api/v2.0/users/current | jq
# 3. Check if user exists
curl -s -u admin:{ADMIN_PASSWORD} https://{HARBOR_HOST}/api/v2.0/users?username={USERNAME} | jq
# 4. Reset password if needed
# Via Harbor UI: https://{HARBOR_HOST}/harbor/users
```
**Verification:** `stella doctor --check check.integration.registry.harbor.auth`
---
#### check.integration.registry.harbor.pull
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.registry.harbor.pull` |
| **Plugin** | `stellaops.doctor.integration.registry.harbor` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `registry`, `harbor` |
| **What it verifies** | Can pull images from configured repositories |
| **Evidence collected** | Test image, pull result, error message |
| **Failure modes** | Permission denied, repository not found |
**Remediation:**
```bash
# 1. Test image pull
docker pull {HARBOR_HOST}/{PROJECT}/{IMAGE}:{TAG}
# 2. Check project membership
curl -s -u {USERNAME}:{PASSWORD} \
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members | jq
# 3. Add user to project if needed
curl -X POST -u admin:{ADMIN_PASSWORD} \
-H "Content-Type: application/json" \
-d '{"role_id": 2, "member_user": {"username": "{USERNAME}"}}' \
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members
# 4. Verify repository exists
curl -s -u {USERNAME}:{PASSWORD} \
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/repositories | jq
```
**Verification:** `stella doctor --check check.integration.registry.harbor.pull`
---
#### check.integration.registry.ecr.connectivity
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.registry.ecr.connectivity` |
| **Plugin** | `stellaops.doctor.integration.registry.ecr` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `registry`, `ecr`, `aws` |
| **What it verifies** | AWS ECR is reachable |
| **Evidence collected** | Registry URL, AWS region, endpoint status |
| **Failure modes** | AWS credentials invalid, region mismatch |
**Remediation:**
```bash
# 1. Verify AWS credentials
aws sts get-caller-identity
# 2. Test ECR describe repositories
aws ecr describe-repositories --region {REGION}
# 3. Get ECR login token
aws ecr get-login-password --region {REGION} | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com
# 4. Check AWS credentials configuration
cat ~/.aws/credentials
```
**Verification:** `stella doctor --check check.integration.registry.ecr.connectivity`
---
#### check.integration.registry.ecr.pull
| Property | Value |
|----------|-------|
| **CheckId** | `check.integration.registry.ecr.pull` |
| **Plugin** | `stellaops.doctor.integration.registry.ecr` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `integration`, `registry`, `ecr`, `aws` |
| **What it verifies** | Can pull images from ECR repositories |
| **Evidence collected** | Repository, IAM permissions, error |
| **Failure modes** | ecr:GetAuthorizationToken denied, ecr:BatchGetImage denied |
**Remediation:**
```bash
# 1. Check IAM permissions
aws iam simulate-principal-policy \
--policy-source-arn {ROLE_ARN} \
--action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer
# 2. Add required IAM policy
aws iam put-role-policy --role-name {ROLE_NAME} --policy-name ECRPullAccess --policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}]
}'
# 3. Test pull
docker pull {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{REPO}:{TAG}
```
**Verification:** `stella doctor --check check.integration.registry.ecr.pull`
---
### 9.7 Observability Plugin (`stellaops.doctor.observability`)
#### check.telemetry.otlp.endpoint
| Property | Value |
|----------|-------|
| **CheckId** | `check.telemetry.otlp.endpoint` |
| **Plugin** | `stellaops.doctor.observability` |
| **Category** | Observability |
| **Severity** | Warn |
| **Tags** | `observability`, `telemetry` |
| **What it verifies** | OTLP collector endpoint is reachable |
| **Evidence collected** | Endpoint URL, response status, protocol |
| **Failure modes** | Collector unreachable, wrong protocol (gRPC vs HTTP) |
**Remediation:**
```bash
# 1. Check OTLP endpoint configuration
echo $OTEL_EXPORTER_OTLP_ENDPOINT
# 2. Test HTTP endpoint
curl -v ${OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces
# 3. Test gRPC endpoint
grpcurl -plaintext {COLLECTOR_HOST}:4317 list
# 4. Check collector is running
# If using OpenTelemetry Collector:
docker logs otel-collector
# 5. Verify collector configuration
cat /etc/otel-collector/config.yaml
```
**Verification:** `stella doctor --check check.telemetry.otlp.endpoint`
---
#### check.logs.directory.writable
| Property | Value |
|----------|-------|
| **CheckId** | `check.logs.directory.writable` |
| **Plugin** | `stellaops.doctor.observability` |
| **Category** | Observability |
| **Severity** | Fail |
| **Tags** | `quick`, `observability`, `logs` |
| **What it verifies** | Log directory is writable |
| **Evidence collected** | Log path, permissions, owner |
| **Failure modes** | Directory not writable, disk full |
**Remediation:**
```bash
# 1. Check log directory permissions
ls -la /var/log/stellaops
# 2. Fix ownership
sudo chown -R stellaops:stellaops /var/log/stellaops
# 3. Fix permissions
sudo chmod 755 /var/log/stellaops
# 4. Check disk space
df -h /var/log/stellaops
```
**Verification:** `stella doctor --check check.logs.directory.writable`
---
#### check.logs.rotation.configured
| Property | Value |
|----------|-------|
| **CheckId** | `check.logs.rotation.configured` |
| **Plugin** | `stellaops.doctor.observability` |
| **Category** | Observability |
| **Severity** | Warn |
| **Tags** | `observability`, `logs` |
| **What it verifies** | Log rotation is configured |
| **Evidence collected** | Rotation config path, settings |
| **Failure modes** | No rotation configured, invalid config |
**Remediation:**
```bash
# 1. Check if logrotate config exists
ls -la /etc/logrotate.d/stellaops
# 2. Create logrotate configuration
sudo cat > /etc/logrotate.d/stellaops << 'EOF'
/var/log/stellaops/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 640 stellaops stellaops
postrotate
systemctl reload stellaops-gateway > /dev/null 2>&1 || true
endscript
}
EOF
# 3. Test logrotate configuration
sudo logrotate -d /etc/logrotate.d/stellaops
```
**Verification:** `stella doctor --check check.logs.rotation.configured`
---
#### check.metrics.prometheus.scrape
| Property | Value |
|----------|-------|
| **CheckId** | `check.metrics.prometheus.scrape` |
| **Plugin** | `stellaops.doctor.observability` |
| **Category** | Observability |
| **Severity** | Warn |
| **Tags** | `observability`, `metrics` |
| **What it verifies** | Prometheus metrics endpoint is accessible |
| **Evidence collected** | Metrics endpoint, sample metrics count |
| **Failure modes** | Endpoint not exposed, auth required |
**Remediation:**
```bash
# 1. Check metrics endpoint
curl -s http://localhost:{PORT}/metrics | head -20
# 2. Verify metrics are being scraped
curl -s http://{PROMETHEUS_HOST}:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job == "stellaops")'
# 3. Add Prometheus scrape config
# In prometheus.yml:
scrape_configs:
- job_name: 'stellaops'
static_configs:
- targets: ['stellaops-gateway:8080', 'stellaops-concelier:8081']
# 4. Reload Prometheus
curl -X POST http://{PROMETHEUS_HOST}:9090/-/reload
```
**Verification:** `stella doctor --check check.metrics.prometheus.scrape`
---
### 9.8 Release Orchestrator Plugin (`stellaops.doctor.releaseorch`)
#### check.releaseorch.environments.configured
| Property | Value |
|----------|-------|
| **CheckId** | `check.releaseorch.environments.configured` |
| **Plugin** | `stellaops.doctor.releaseorch` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `release`, `environments` |
| **What it verifies** | At least one environment is configured |
| **Evidence collected** | Environment count, environment names |
| **Failure modes** | No environments configured |
**Remediation:**
```bash
# 1. List current environments
stella environments list
# 2. Create development environment
stella environments create \
--name development \
--type development \
--promotion-target staging
# 3. Create staging environment
stella environments create \
--name staging \
--type staging \
--promotion-target production \
--requires-approval
# 4. Create production environment
stella environments create \
--name production \
--type production \
--requires-approval
```
**Verification:** `stella doctor --check check.releaseorch.environments.configured`
---
#### check.releaseorch.deployments.targets
| Property | Value |
|----------|-------|
| **CheckId** | `check.releaseorch.deployments.targets` |
| **Plugin** | `stellaops.doctor.releaseorch` |
| **Category** | Integration |
| **Severity** | Fail |
| **Tags** | `release`, `deployments` |
| **What it verifies** | Deployment targets are reachable |
| **Evidence collected** | Target type, connectivity status, last heartbeat |
| **Failure modes** | Agent offline, target unreachable |
**Remediation:**
```bash
# 1. List deployment targets
stella deployments targets list
# 2. Check agent status
stella deployments targets health --target {TARGET_ID}
# 3. Restart agent if needed
# On target host:
sudo systemctl restart stellaops-agent
# 4. Re-register target if agent was reinstalled
stella deployments targets register \
--name {TARGET_NAME} \
--type docker-compose \
--endpoint ssh://user@host
```
**Verification:** `stella doctor --check check.releaseorch.deployments.targets`
---
## 10. Plugin Implementation Details
### 10.1 Core Platform Plugin
**Location:** `src/__Libraries/StellaOps.Doctor/Plugins/Core/`
Provides foundational checks for configuration, runtime, and platform health.
**Checks Provided:**
- `check.config.required`
- `check.config.syntax`
- `check.config.deprecated`
- `check.runtime.dotnet`
- `check.runtime.memory`
- `check.runtime.disk.space`
- `check.runtime.disk.permissions`
- `check.time.sync`
- `check.crypto.profiles`
**Dependencies:** None (core plugin)
---
### 10.2 Database & Migrations Plugin
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Database/`
Provides database connectivity and migration state checks.
**References:**
- `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs`
- `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationStatusService.cs`
**Checks Provided:**
- `check.database.connectivity`
- `check.database.version`
- `check.database.migrations.pending`
- `check.database.migrations.checksum`
- `check.database.migrations.lock`
- `check.database.schema.{schema}` (dynamic per schema)
- `check.database.connections.pool`
**Configuration:**
```yaml
Doctor:
Plugins:
Database:
Enabled: true
ConnectionTimeout: 10s
Schemas:
- auth
- vuln
- scanner
- orchestrator
```
---
### 10.3 Service Graph Plugin
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ServiceGraph/`
Validates inter-service connectivity via Gateway and Router.
**References:**
- `src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs`
- `src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs`
**Checks Provided:**
- `check.services.gateway.running`
- `check.services.gateway.routing`
- `check.services.{service}.health` (dynamic per service)
- `check.services.{service}.connectivity` (dynamic per service)
- `check.services.authority.connectivity`
**Configuration:**
```yaml
Doctor:
Plugins:
ServiceGraph:
Enabled: true
HealthEndpointTimeout: 5s
Services:
- name: concelier
port: 8081
- name: scanner
port: 8082
- name: attestor
port: 8083
```
---
### 10.4 Security Plugin
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Security/`
Validates authentication, authorization, TLS, and secrets management.
**References:**
- `src/Authority/StellaOps.Authority/StellaOps.Authority.Plugin.Ldap/`
- `src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/HashiCorpVaultConnector.cs`
**Checks Provided:**
- `check.auth.oidc.discovery`
- `check.auth.oidc.jwks`
- `check.auth.ldap.bind`
- `check.auth.ldap.search`
- `check.auth.ldap.groups`
- `check.tls.certificates.expiry`
- `check.tls.certificates.chain`
- `check.secrets.vault.connectivity`
- `check.secrets.vault.auth`
- `check.secrets.vault.paths`
---
### 10.5 SCM Integration Plugins
**GitHub Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitHub/`
**GitLab Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitLab/`
**References:**
- `src/Integrations/__Plugins/StellaOps.Integrations.Plugin.GitHubApp/`
- `etc/scm-connectors/github.yaml`
**GitHub Checks:**
- `check.integration.scm.github.connectivity`
- `check.integration.scm.github.auth`
- `check.integration.scm.github.permissions`
- `check.integration.scm.github.ratelimit`
**GitLab Checks:**
- `check.integration.scm.gitlab.connectivity`
- `check.integration.scm.gitlab.auth`
- `check.integration.scm.gitlab.permissions`
---
### 10.6 Registry Integration Plugins
**Harbor Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.Harbor/`
**ECR Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.ECR/`
**References:**
- `src/Integrations/__Plugins/StellaOps.Integrations.Plugin.Harbor/`
**Harbor Checks:**
- `check.integration.registry.harbor.connectivity`
- `check.integration.registry.harbor.auth`
- `check.integration.registry.harbor.pull`
**ECR Checks:**
- `check.integration.registry.ecr.connectivity`
- `check.integration.registry.ecr.pull`
---
### 10.7 Observability Plugin
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Observability/`
**References:**
- `devops/telemetry/otel-collector.yaml`
**Checks Provided:**
- `check.telemetry.otlp.endpoint`
- `check.logs.directory.writable`
- `check.logs.rotation.configured`
- `check.metrics.prometheus.scrape`
---
### 10.8 Release Orchestrator Plugin
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ReleaseOrch/`
**References:**
- `src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/`
**Checks Provided:**
- `check.releaseorch.environments.configured`
- `check.releaseorch.deployments.targets`
---
## Appendix A: Complete Check ID Reference
| CheckId | Plugin | Category | Default Severity |
|---------|--------|----------|------------------|
| `check.config.required` | core | Core | Fail |
| `check.config.syntax` | core | Core | Fail |
| `check.config.deprecated` | core | Core | Warn |
| `check.runtime.dotnet` | core | Core | Fail |
| `check.runtime.memory` | core | Core | Warn |
| `check.runtime.disk.space` | core | Core | Warn |
| `check.runtime.disk.permissions` | core | Core | Fail |
| `check.time.sync` | core | Core | Warn |
| `check.crypto.profiles` | core | Core | Fail |
| `check.database.connectivity` | database | Database | Fail |
| `check.database.version` | database | Database | Warn |
| `check.database.migrations.pending` | database | Database | Fail |
| `check.database.migrations.checksum` | database | Database | Fail |
| `check.database.migrations.lock` | database | Database | Warn |
| `check.database.schema.{schema}` | database | Database | Fail |
| `check.database.connections.pool` | database | Database | Warn |
| `check.services.gateway.running` | servicegraph | ServiceGraph | Fail |
| `check.services.gateway.routing` | servicegraph | ServiceGraph | Fail |
| `check.services.{service}.health` | servicegraph | ServiceGraph | Fail |
| `check.services.{service}.connectivity` | servicegraph | ServiceGraph | Fail |
| `check.services.authority.connectivity` | servicegraph | ServiceGraph | Fail |
| `check.auth.oidc.discovery` | security | Security | Fail |
| `check.auth.oidc.jwks` | security | Security | Fail |
| `check.auth.ldap.bind` | security | Security | Fail |
| `check.auth.ldap.search` | security | Security | Fail |
| `check.auth.ldap.groups` | security | Security | Warn |
| `check.tls.certificates.expiry` | security | Security | Warn/Fail |
| `check.tls.certificates.chain` | security | Security | Fail |
| `check.secrets.vault.connectivity` | security | Security | Fail |
| `check.secrets.vault.auth` | security | Security | Fail |
| `check.secrets.vault.paths` | security | Security | Fail |
| `check.integration.scm.github.connectivity` | scm.github | Integration | Fail |
| `check.integration.scm.github.auth` | scm.github | Integration | Fail |
| `check.integration.scm.github.permissions` | scm.github | Integration | Fail |
| `check.integration.scm.github.ratelimit` | scm.github | Integration | Warn |
| `check.integration.scm.gitlab.connectivity` | scm.gitlab | Integration | Fail |
| `check.integration.scm.gitlab.auth` | scm.gitlab | Integration | Fail |
| `check.integration.registry.harbor.connectivity` | registry.harbor | Integration | Fail |
| `check.integration.registry.harbor.auth` | registry.harbor | Integration | Fail |
| `check.integration.registry.harbor.pull` | registry.harbor | Integration | Fail |
| `check.integration.registry.ecr.connectivity` | registry.ecr | Integration | Fail |
| `check.integration.registry.ecr.pull` | registry.ecr | Integration | Fail |
| `check.telemetry.otlp.endpoint` | observability | Observability | Warn |
| `check.logs.directory.writable` | observability | Observability | Fail |
| `check.logs.rotation.configured` | observability | Observability | Warn |
| `check.metrics.prometheus.scrape` | observability | Observability | Warn |
| `check.releaseorch.environments.configured` | releaseorch | Integration | Fail |
| `check.releaseorch.deployments.targets` | releaseorch | Integration | Fail |
---
## Appendix B: Quick Reference - Common Issues
### Database Issues
```bash
# Connection refused
sudo systemctl start postgresql
stella doctor --check check.database.connectivity
# Pending migrations
stella system migrations-run --category release
stella doctor --check check.database.migrations.pending
# Migration lock stuck
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"
```
### Authentication Issues
```bash
# OIDC discovery fails
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration
sudo systemctl restart stellaops-authority
# LDAP bind fails
ldapsearch -x -H ldaps://{HOST}:636 -D "{BIND_DN}" -w "{PASSWORD}" -b "" -s base
```
### Integration Issues
```bash
# GitHub rate limit
curl -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit
# Harbor connectivity
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq
```
---
*Document generated: 2026-01-12*
*Stella Ops Doctor Capability Specification v1.0.0-draft*