106 KiB
Stella Ops Doctor Capability Specification
Status: Planning / Capability Design Version: 1.0.0-draft Last Updated: 2026-01-12
Table of Contents
- Executive Summary
- Current State Analysis
- Doctor Architecture
- Plugin System Specification
- CLI Surface
- UI Surface
- API Surface
- Remediation Command Patterns
- Doctor Check Catalog
- Plugin Implementation Details
1. Executive Summary
1.1 Purpose
The Doctor capability provides comprehensive self-service diagnostics for Stella Ops deployments. It enables operators, DevOps engineers, and developers to:
- Diagnose what is working and what is not
- Understand why failures occur with collected evidence
- Remediate issues with copy/paste commands
- Verify fixes with re-runnable checks
1.2 Target Users
| User Type | Primary Use Case |
|---|---|
| Operators | Pre-deployment validation, incident triage, routine health checks |
| DevOps Engineers | Integration setup, migration management, environment troubleshooting |
| Developers | Local development environment validation, API connectivity testing |
| Support Engineers | Remote diagnostics, evidence collection for escalation |
1.3 Key Principles
- Plugin-First Architecture - All checks implemented via extensible plugins
- Actionable Remediation - Every failure includes copy/paste fix commands
- Zero Docs Familiarity - Users can diagnose and fix without reading documentation
- Evidence-Based Diagnostics - All checks collect and report evidence
- Multi-Surface Consistency - Same check engine powers CLI, UI, and API
- Non-Destructive Fixes - Doctor never executes destructive actions; fix commands must be safe and idempotent
1.4 Surfaces
| Surface | Entry Point | Primary Use |
|---|---|---|
| CLI | stella doctor |
Automation, CI/CD gates, SSH troubleshooting |
| UI | /ops/doctor |
Interactive diagnosis, team collaboration |
| API | POST /api/v1/doctor/run |
Programmatic integration, monitoring systems |
2. Current State Analysis
2.1 CLI - Current State
Location: src/Cli/StellaOps.Cli/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Entry Point | src/Cli/StellaOps.Cli/Program.cs |
Main CLI bootstrap using System.CommandLine |
| Command Factory | src/Cli/StellaOps.Cli/Commands/CommandFactory.cs |
Registers 88+ command groups |
| Config Bootstrap | src/Cli/StellaOps.Cli/Configuration/CliBootstrapper.cs |
Environment + YAML/JSON config loading |
| Exit Codes | src/Cli/StellaOps.Cli/CliExitCodes.cs |
Standardized exit codes (0-99) |
| Crypto Validator | src/Cli/StellaOps.Cli/Services/CryptoProfileValidator.cs |
Startup validation for crypto profiles |
| Migration Commands | src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs |
migrations-run, migrations-status, migrations-verify |
Existing Validation Patterns
// CryptoProfileValidator.cs - Startup validation pattern
public sealed record ValidationResult
{
public bool IsValid { get; init; }
public bool HasWarnings { get; init; }
public bool HasErrors { get; init; }
public List<string> Errors { get; init; }
public List<string> Warnings { get; init; }
public string ActiveProfile { get; init; }
public List<string> AvailableProviders { get; init; }
}
Gaps
- No unified
stella doctorcommand - Output formatting is ad-hoc per command (no centralized formatter)
- No remediation command generation
- Validation only for crypto profiles, not comprehensive system state
Proposed Capability
# Quick system health check
stella doctor
# Full diagnostic with all checks
stella doctor --full
# Check specific category
stella doctor --category database
stella doctor --category integrations
# Check specific plugin
stella doctor --plugin scm.github
# Run single check
stella doctor --check check.database.migrations.pending
# Output formats
stella doctor --format json
stella doctor --format markdown
stella doctor --format text
# Export report
stella doctor --export report.json
stella doctor --export report.md
# Filter by severity
stella doctor --severity fail,warn
2.2 Health Infrastructure - Current State
Pattern: Extensive health endpoints across 20+ services
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Health Status Enum | src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthStatus.cs |
Unknown, Healthy, Degraded, Unhealthy |
| Health Check Result | src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs |
Rich result with factory methods |
| Gateway Health | src/Gateway/StellaOps.Gateway.WebService/Middleware/HealthCheckMiddleware.cs |
/health/live, /health/ready, /health/startup |
| Scanner Health | src/Scanner/StellaOps.Scanner.WebService/Endpoints/HealthEndpoints.cs |
/healthz, /readyz |
| Orchestrator Health | src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/HealthEndpoints.cs |
/health/details |
| Platform Health | src/Platform/__Libraries/StellaOps.Platform.Health/PlatformHealthService.cs |
Cross-service aggregation |
| Health Contract | devops/docker/health-endpoints.md |
Formal endpoint specification |
Health Check Result Model
// From src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs
public sealed record HealthCheckResult(
HealthStatus Status,
string? Message,
IReadOnlyDictionary<string, string>? Details,
DateTimeOffset CheckedAt,
TimeSpan Duration)
{
public static HealthCheckResult Healthy(string? message = null) => ...
public static HealthCheckResult Degraded(string message) => ...
public static HealthCheckResult Unhealthy(string message, Exception? ex = null) => ...
}
Gaps
- Health endpoints check liveness/readiness, not comprehensive diagnostics
- No remediation guidance in health responses
- No aggregated cross-service diagnostic view
- Health checks don't verify configuration validity
2.3 Doctor Service - Current State (ReleaseOrchestrator)
Location: src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Doctor Service | Doctor/DoctorService.cs |
Runs IDoctorCheck implementations |
| Doctor Report | Doctor/DoctorReport.cs |
Aggregated results with counts |
| Check Result | Doctor/CheckResult.cs |
Individual check outcome |
| IDoctorCheck | Doctor/IDoctorCheck.cs |
Plugin interface for checks |
IDoctorCheck Interface
// Existing interface (simplified)
public interface IDoctorCheck
{
string Name { get; }
string Category { get; }
Task<CheckResult> RunAsync(CancellationToken ct);
}
public sealed record CheckResult(
string Name,
HealthStatus Status,
string? Message,
TimeSpan Duration);
public sealed record DoctorReport(
int PassCount,
int WarningCount,
int FailCount,
int SkippedCount,
HealthStatus OverallStatus,
TimeSpan TotalDuration,
IReadOnlyList<CheckResult> Results);
Gaps
- Only available in ReleaseOrchestrator, not CLI or other modules
- No remediation commands in output
- No evidence collection
- Limited to integration checks only
- No plugin discovery mechanism
2.4 Integration Plugins - Current State
Location: src/Integrations/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Plugin Contract | __Libraries/StellaOps.Integrations.Contracts/IIntegrationConnectorPlugin.cs |
Core plugin interface |
| Integration Types | __Libraries/StellaOps.Integrations.Contracts/IntegrationType.cs |
Registry, SCM, CI/CD, etc. |
| GitHub Plugin | __Plugins/StellaOps.Integrations.Plugin.GitHubApp/GitHubAppConnectorPlugin.cs |
GitHub App integration |
| Harbor Plugin | __Plugins/StellaOps.Integrations.Plugin.Harbor/HarborConnectorPlugin.cs |
Harbor registry |
| Plugin Loader | StellaOps.Integrations.WebService/IntegrationPluginLoader.cs |
Assembly-based discovery |
| Vault Connectors | src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/ |
HashiCorp Vault, Azure Key Vault |
IIntegrationConnectorPlugin Interface
public interface IIntegrationConnectorPlugin : IAvailabilityPlugin
{
IntegrationType Type { get; }
IntegrationProvider Provider { get; }
string Name { get; }
Task<TestConnectionResult> TestConnectionAsync(
IntegrationConfig config,
CancellationToken ct);
Task<HealthCheckResult> CheckHealthAsync(
IntegrationConfig config,
CancellationToken ct);
}
Supported Integration Types
public enum IntegrationType
{
Registry = 1, // Harbor, ECR, GCR, ACR, Docker Hub, Quay, Artifactory
Scm = 2, // GitHub, GitLab, Bitbucket, Gitea, Azure DevOps
CiCd = 3, // GitHub Actions, GitLab CI, Jenkins, CircleCI
RepoSource = 4, // npm, PyPI, Maven, NuGet, Crates.io
RuntimeHost = 5, // eBPF, ETW, dyld agents
FeedMirror = 6 // NVD, OSV, StellaOps mirrors
}
Gaps
TestConnectionAsyncexists but not surfaced via CLI doctor- No standardized remediation output
- Health checks don't report required permissions/scopes
- No validation of webhook/event delivery configuration
2.5 Authority Plugins - Current State
Location: src/Authority/StellaOps.Authority/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Plugin Abstractions | StellaOps.Authority.Plugins.Abstractions/ |
Plugin registration interface |
| LDAP Plugin | StellaOps.Authority.Plugin.Ldap/ |
LDAP/AD integration |
| OIDC Plugin | StellaOps.Authority.Plugin.Oidc/ |
OpenID Connect |
| SAML Plugin | StellaOps.Authority.Plugin.Saml/ |
SAML 2.0 |
| Plugin Registry | StellaOps.Authority/AuthorityPluginRegistry.cs |
Manages named plugins |
| LDAP Config | etc/authority.plugins/ldap.yaml |
Sample configuration |
LDAP Plugin Capabilities
# From etc/authority.plugins/ldap.yaml
connection:
host: "ldaps://ldap.example.internal"
port: 636
searchBase: "ou=people,dc=example,dc=internal"
bindDn: "cn=bind-user,ou=service,dc=example,dc=internal"
bindPasswordSecret: "file:/etc/secrets/ldap-bind.txt"
security:
requireTls: true
claims:
groupAttribute: "memberOf"
cache:
enabled: true
ttlSeconds: 600
Gaps
- No CLI command to validate LDAP configuration
- Health checks exist but don't provide remediation
- No validation of group mapping correctness
- TLS certificate validation not exposed as diagnostic
2.6 Database & Migrations - Current State
Location: src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Migration Runner | Migrations/MigrationRunner.cs |
Executes SQL migrations with advisory locks |
| Migration Category | Migrations/MigrationCategory.cs |
Startup, Release, Seed, Data |
| Status Service | Migrations/MigrationStatusService.cs |
Query migration state |
| CLI Commands | src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs |
migrations-run/status/verify |
| Strategy Docs | docs/db/MIGRATION_STRATEGY.md |
Migration process documentation |
Migration Categories
| Prefix | Category | Automatic | Breaking |
|---|---|---|---|
001-099 |
Startup | Yes | No |
100-199 |
Release | No (CLI) | Yes |
S001-S999 |
Seed | Yes | No |
DM001-DM999 |
Data | Background | Varies |
Schema Tracking
CREATE TABLE {schema}.schema_migrations (
migration_name TEXT PRIMARY KEY,
category TEXT NOT NULL DEFAULT 'startup',
checksum TEXT NOT NULL,
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
applied_by TEXT,
duration_ms INT
);
Gaps
- Migration status not integrated with doctor
- No checksum mismatch diagnostics with remediation
- Lock contention not diagnosed
- No cross-schema migration state view
2.7 UI - Current State
Location: src/Web/StellaOps.Web/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Routes | src/app/app.routes.ts |
Angular Router configuration |
| Platform Health | src/app/features/platform-health/ |
Health dashboard at /ops/health |
| Health Client | src/app/core/api/platform-health.client.ts |
API client for health endpoints |
| Console Status | src/app/features/console/console-status.component.ts |
Queue/run status |
Platform Health Dashboard Features
- Real-time KPI strip (services, latency, error rate, incidents)
- Service health grid with grouping (healthy/degraded/unhealthy)
- Dependency graph visualization
- Incident timeline (last 24h)
- Auto-refresh every 10 seconds
Gaps
- No diagnostic check execution from UI
- No remediation command display
- No evidence collection/export
- Health dashboard shows status, not actionable diagnostics
2.8 Service Connectivity - Current State
Location: src/Gateway/, src/Router/
What Exists Today
| Component | File Path | Description |
|---|---|---|
| Gateway Routing | src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs |
HTTP to microservice routing |
| Connection Manager | src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs |
HELLO handshake, heartbeats |
| Routing State | src/Router/__Libraries/StellaOps.Router.Common/Abstractions/IGlobalRoutingState.cs |
Live service connections |
| Claims Propagation | src/Gateway/StellaOps.Gateway.WebService/Middleware/ClaimsPropagationMiddleware.cs |
OAuth claims forwarding |
Service Registration Flow
- Service connects to Gateway via Router transport (TCP/TLS/Valkey)
- HELLO handshake with endpoint/schema declarations
- Periodic heartbeats with health/latency metrics
- Gateway maintains
ConnectionStatefor routing decisions
Gaps
- No CLI command to verify service graph health
- Routing failures not diagnosed with remediation
- No validation of claims propagation configuration
- Transport connectivity not exposed as diagnostic
3. Doctor Architecture
3.1 High-Level Architecture
+------------------+ +------------------+ +------------------+
| CLI | | UI | | External |
| stella doctor | | /ops/doctor | | Monitoring |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
v v v
+------------------------------------------------------------------------+
| Doctor API Layer |
| POST /api/v1/doctor/run GET /api/v1/doctor/checks |
| GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Doctor Engine (Core) |
| +------------------+ +------------------+ +------------------+ |
| | Check Registry | | Check Executor | | Report Generator | |
| | - Discovery | | - Parallel exec | | - JSON/MD/Text | |
| | - Filtering | | - Timeout mgmt | | - Remediation | |
| +------------------+ +------------------+ +------------------+ |
+------------------------------------------------------------------------+
|
v
+------------------------------------------------------------------------+
| Plugin System |
+--------+---------+---------+---------+---------+---------+-------------+
| | | | | |
v v v v v v
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
| Core | | DB & | |Service| | SCM | |Regis-| | Vault| | Authority|
| Plugin | |Migra-| | Graph | |Plugin| | try | |Plugin| | Plugin |
| | | tions| |Plugin | | | |Plugin| | | | |
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
3.2 Core Components
Doctor Engine
Proposed Location: src/__Libraries/StellaOps.Doctor/
StellaOps.Doctor/
├── Engine/
│ ├── DoctorEngine.cs # Main orchestrator
│ ├── CheckExecutor.cs # Parallel check execution
│ └── CheckRegistry.cs # Plugin discovery & filtering
├── Models/
│ ├── DoctorCheckResult.cs # Extended check result with evidence
│ ├── DoctorReport.cs # Full report model
│ ├── Remediation.cs # Fix command model
│ └── Evidence.cs # Collected evidence model
├── Plugins/
│ ├── IDoctorPlugin.cs # Plugin interface
│ ├── IDoctorCheck.cs # Check interface (extended)
│ └── DoctorPluginContext.cs # Plugin execution context
├── Output/
│ ├── JsonReportFormatter.cs # JSON output
│ ├── MarkdownReportFormatter.cs # Markdown output
│ └── TextReportFormatter.cs # Console text output
└── DoctorServiceExtensions.cs # DI registration
Check Execution Model
public sealed class CheckExecutor
{
private readonly IEnumerable<IDoctorPlugin> _plugins;
private readonly TimeProvider _timeProvider;
private readonly ILogger<CheckExecutor> _logger;
public async Task<DoctorReport> RunAsync(
DoctorRunOptions options,
CancellationToken ct)
{
var checks = GetFilteredChecks(options);
var results = new ConcurrentBag<DoctorCheckResult>();
// Parallel execution with configurable concurrency
await Parallel.ForEachAsync(
checks,
new ParallelOptions
{
MaxDegreeOfParallelism = options.Parallelism,
CancellationToken = ct
},
async (check, token) =>
{
var result = await ExecuteCheckAsync(check, options, token);
results.Add(result);
});
return GenerateReport(results, options);
}
}
3.3 Result Model
public sealed record DoctorCheckResult
{
// Identity
public required string CheckId { get; init; }
public required string PluginId { get; init; }
public required string Category { get; init; }
// Outcome
public required DoctorSeverity Severity { get; init; } // Pass, Warn, Fail, Skip
public required string Diagnosis { get; init; }
// Evidence
public required Evidence Evidence { get; init; }
// Remediation
public IReadOnlyList<string>? LikelyCauses { get; init; }
public Remediation? Remediation { get; init; }
public string? VerificationCommand { get; init; }
// Metadata
public required TimeSpan Duration { get; init; }
public required DateTimeOffset ExecutedAt { get; init; }
}
public enum DoctorSeverity
{
Pass = 0,
Info = 1,
Warn = 2,
Fail = 3,
Skip = 4
}
public sealed record Evidence
{
public required string Description { get; init; }
public required IReadOnlyDictionary<string, string> Data { get; init; }
public IReadOnlyList<string>? SensitiveKeys { get; init; } // Keys to redact in output
}
public sealed record Remediation
{
public required IReadOnlyList<RemediationStep> Steps { get; init; }
public string? SafetyNote { get; init; }
public bool RequiresBackup { get; init; }
}
public sealed record RemediationStep
{
public required int Order { get; init; }
public required string Description { get; init; }
public required string Command { get; init; }
public CommandType CommandType { get; init; } // Shell, SQL, API, FileEdit
public IReadOnlyDictionary<string, string>? Placeholders { get; init; }
}
public enum CommandType
{
Shell, // Bash/PowerShell command
SQL, // SQL statement
API, // API call (curl/stella CLI)
FileEdit, // File modification
Manual // Manual step (no command)
}
4. Plugin System Specification
4.1 Plugin Interface
/// <summary>
/// Base interface for Doctor plugins.
/// Plugins group related checks and share configuration context.
/// </summary>
public interface IDoctorPlugin
{
/// <summary>Unique plugin identifier (e.g., "stellaops.doctor.database")</summary>
string PluginId { get; }
/// <summary>Human-readable name</summary>
string DisplayName { get; }
/// <summary>Plugin category for filtering</summary>
DoctorCategory Category { get; }
/// <summary>Plugin version for compatibility</summary>
Version Version { get; }
/// <summary>Minimum Doctor engine version required</summary>
Version MinEngineVersion { get; }
/// <summary>Check if plugin is available in current environment</summary>
bool IsAvailable(IServiceProvider services);
/// <summary>Get all checks provided by this plugin</summary>
IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context);
/// <summary>Initialize plugin with configuration</summary>
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
}
public enum DoctorCategory
{
Core, // Platform, config, runtime
Database, // Schema, migrations, connectivity
ServiceGraph, // Inter-service communication
Integration, // External system integrations
Security, // Auth, TLS, secrets
Observability // Logs, metrics, traces
}
4.2 Check Interface
/// <summary>
/// Individual diagnostic check.
/// </summary>
public interface IDoctorCheck
{
/// <summary>Unique check identifier (e.g., "check.database.migrations.pending")</summary>
string CheckId { get; }
/// <summary>Human-readable name</summary>
string Name { get; }
/// <summary>What this check verifies</summary>
string Description { get; }
/// <summary>Default severity if check fails</summary>
DoctorSeverity DefaultSeverity { get; }
/// <summary>Tags for filtering (e.g., ["quick", "security", "migration"])</summary>
IReadOnlyList<string> Tags { get; }
/// <summary>Estimated execution time</summary>
TimeSpan EstimatedDuration { get; }
/// <summary>Check if this check can run in current context</summary>
bool CanRun(DoctorPluginContext context);
/// <summary>Execute the check</summary>
Task<DoctorCheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
}
4.3 Plugin Context
public sealed class DoctorPluginContext
{
public required IServiceProvider Services { get; init; }
public required IConfiguration Configuration { get; init; }
public required TimeProvider TimeProvider { get; init; }
public required ILogger Logger { get; init; }
// Runtime info
public required string EnvironmentName { get; init; } // Development, Staging, Production
public required string? TenantId { get; init; }
// Plugin configuration
public required JsonElement PluginConfig { get; init; }
// Evidence helpers
public EvidenceBuilder CreateEvidence() => new();
public RemediationBuilder CreateRemediation() => new();
// Secret redaction
public string Redact(string value) => "***REDACTED***";
public string RedactConnectionString(string cs) => /* redact password */;
}
4.4 Plugin Discovery
Static Discovery (Build-time)
Plugins register via DI at startup:
// In Program.cs or startup
services.AddDoctorPlugin<CoreDoctorPlugin>();
services.AddDoctorPlugin<DatabaseDoctorPlugin>();
services.AddDoctorPlugin<ServiceGraphDoctorPlugin>();
services.AddDoctorPlugin<ScmGitHubDoctorPlugin>();
// ...
Dynamic Discovery (Runtime)
Plugins can be loaded from assemblies:
// In DoctorPluginLoader.cs
public class DoctorPluginLoader
{
public IEnumerable<IDoctorPlugin> LoadFromDirectory(string path)
{
foreach (var dll in Directory.GetFiles(path, "StellaOps.Doctor.Plugin.*.dll"))
{
var assembly = Assembly.LoadFrom(dll);
foreach (var type in assembly.GetTypes()
.Where(t => typeof(IDoctorPlugin).IsAssignableFrom(t) && !t.IsAbstract))
{
yield return (IDoctorPlugin)Activator.CreateInstance(type)!;
}
}
}
}
4.5 Declarative Doctor Packs (YAML)
Doctor packs provide declarative checks that wrap CLI commands and parsing rules.
They complement compiled plugins and are loaded from plugins/doctor/*.yaml (plus optional override directories).
Short example:
apiVersion: stella.ops/doctor.v1
kind: DoctorPlugin
metadata:
name: doctor-release-orchestrator-gitlab
spec:
discovery:
when:
- env: GITLAB_URL
Full sample: docs/benchmarks/doctor/doctor-plugin-release-orchestrator-gitlab.yaml
Key fields:
spec.discovery.when: env/file existence gates.checks[].run.exec: command to execute (must be deterministic).checks[].parse.expectorchecks[].parse.expectJson: pass/fail rules.checks[].how_to_fix.commands[]: exact fix commands printed verbatim.
4.6 Plugin Directory Structure
src/
├── __Libraries/
│ └── StellaOps.Doctor/ # Core doctor engine
│ └── Plugins/
│ └── Core/ # Built-in core plugin
├── Doctor/
│ └── __Plugins/
│ ├── StellaOps.Doctor.Plugin.Database/
│ ├── StellaOps.Doctor.Plugin.ServiceGraph/
│ ├── StellaOps.Doctor.Plugin.Scm.GitHub/
│ ├── StellaOps.Doctor.Plugin.Scm.GitLab/
│ ├── StellaOps.Doctor.Plugin.Registry.Harbor/
│ ├── StellaOps.Doctor.Plugin.Registry.ECR/
│ ├── StellaOps.Doctor.Plugin.Vault/
│ ├── StellaOps.Doctor.Plugin.Authority/
│ └── StellaOps.Doctor.Plugin.Observability/
4.7 Plugin Configuration
Plugins read configuration from the standard config hierarchy:
# In stellaops.yaml or environment-specific config
Doctor:
Enabled: true
DefaultTimeout: 30s
Parallelism: 4
Plugins:
Database:
Enabled: true
ConnectionTimeout: 10s
ServiceGraph:
Enabled: true
HealthEndpointTimeout: 5s
Scm:
GitHub:
Enabled: true
RateLimitThreshold: 100
Registry:
Harbor:
Enabled: true
SkipTlsVerify: false
Vault:
Enabled: true
SecretsToValidate:
- "secret/data/stellaops/api-keys"
- "secret/data/stellaops/certificates"
4.8 Security Model
Secret Redaction
All evidence output is sanitized:
public sealed class EvidenceBuilder
{
private readonly Dictionary<string, string> _data = new();
private readonly List<string> _sensitiveKeys = new();
public EvidenceBuilder Add(string key, string value)
{
_data[key] = value;
return this;
}
public EvidenceBuilder AddSensitive(string key, string value)
{
_data[key] = value;
_sensitiveKeys.Add(key);
return this;
}
public EvidenceBuilder AddConnectionString(string key, string connectionString)
{
// Parse and redact password
var redacted = RedactConnectionStringPassword(connectionString);
_data[key] = redacted;
return this;
}
}
RBAC Permissions
Doctor checks require specific scopes:
| Scope | Description |
|---|---|
doctor:run |
Execute doctor checks |
doctor:run:full |
Execute all checks including sensitive |
doctor:export |
Export diagnostic reports |
admin:system |
Access system-level checks |
4.9 Versioning Strategy
- Engine version: Semantic versioning (e.g.,
1.0.0) - Plugin version: Independent semantic versioning
- Compatibility: Plugins declare
MinEngineVersion - Check IDs: Stable across versions (never renamed)
// Version compatibility check
if (plugin.MinEngineVersion > DoctorEngine.Version)
{
_logger.LogWarning(
"Plugin {PluginId} requires engine {Required}, current is {Current}. Skipping.",
plugin.PluginId, plugin.MinEngineVersion, DoctorEngine.Version);
continue;
}
5. CLI Surface
5.1 Command Structure
Proposed Location: src/Cli/StellaOps.Cli/Commands/DoctorCommandGroup.cs
stella doctor run [options]
stella doctor list [options]
stella doctor fix --from report.json [--apply]
Note: stella doctor remains shorthand for stella doctor run for compatibility.
stella doctor fix executes only non-destructive commands. Any destructive step
must be presented as manual guidance and is not eligible for --apply.
5.2 Options and Flags
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--format |
-f |
enum | text |
Output format: text, table, json, markdown |
--quick |
-q |
flag | false | Run only quick checks (tagged quick) |
--full |
flag | false | Run all checks including slow/intensive | |
--pack |
string[] | all | Filter by pack name (manifest grouping) | |
--category |
-c |
string[] | all | Filter by category: core, database, service-graph, integration, security, observability |
--plugin |
-p |
string[] | all | Filter by plugin ID (e.g., scm.github) |
--check |
string | Run single check by ID | ||
--severity |
-s |
enum[] | all | Filter output by severity: pass, info, warn, fail |
--export |
-e |
path | Export report to file | |
--timeout |
-t |
duration | 30s | Per-check timeout |
--parallel |
int | 4 | Max parallel check execution | |
--no-remediation |
flag | false | Skip remediation command generation | |
--verbose |
-v |
flag | false | Include detailed evidence in output |
--tenant |
string | Tenant context for multi-tenant checks |
Fix Options
| Option | Type | Default | Description |
|---|---|---|---|
--from |
path | required | Path to JSON report with how_to_fix commands |
--apply |
flag | false | Execute fixes (default is dry-run preview) |
Only commands marked safe and non-destructive are eligible for --apply.
Destructive changes must be printed as manual steps and executed by the operator outside Doctor.
5.3 Exit Codes
| Code | Meaning |
|---|---|
| 0 | All checks passed |
| 1 | One or more warnings |
| 2 | One or more failures |
| 3 | Doctor engine error |
| 4 | Invalid arguments |
| 5 | Timeout exceeded |
5.4 Usage Examples
# Quick health check (alias)
stella doctor
# Run all checks explicitly
stella doctor run
# Full diagnostic
stella doctor --full
# Check only database category
stella doctor --category database
# Check specific integration
stella doctor --plugin scm.github
# Run single check
stella doctor --check check.database.migrations.pending
# JSON output for CI/CD
stella doctor --format json --severity fail,warn
# Run orchestrator pack with table output
stella doctor run --pack orchestrator --format table
# Apply fixes from a JSON report (dry-run unless --apply)
stella doctor fix --from out.json --apply
# Export markdown report
stella doctor --full --format markdown --export doctor-report.md
# Verbose with all evidence
stella doctor --verbose --full
# Quick check with 60s timeout
stella doctor --quick --timeout 60s
5.5 Text Output Format
Stella Ops Doctor
=================
Running 47 checks across 8 plugins...
[PASS] check.config.required
All required configuration values are present
[PASS] check.database.connectivity
PostgreSQL connection successful (latency: 12ms)
[WARN] check.tls.certificates.expiry
Diagnosis: TLS certificate expires in 14 days
Evidence:
Certificate: /etc/ssl/certs/stellaops.crt
Subject: CN=stellaops.example.com
Expires: 2026-01-26T00:00:00Z
Days remaining: 14
Likely Causes:
1. Certificate renewal not scheduled
2. ACME/Let's Encrypt automation not configured
Fix Steps:
# 1. Check current certificate
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates
# 2. Renew certificate (if using certbot)
sudo certbot renew --cert-name stellaops.example.com
# 3. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway
Verification:
stella doctor --check check.tls.certificates.expiry
[FAIL] check.database.migrations.pending
Diagnosis: 3 pending release migrations detected in schema 'auth'
Evidence:
Schema: auth
Current version: 099_add_dpop_thumbprints
Pending migrations:
- 100_add_tenant_quotas
- 101_add_audit_retention
- 102_add_session_revocation
Connection: postgres://localhost:5432/stellaops (user: stella_app)
Likely Causes:
1. Release migrations not applied before deployment
2. Migration files added after last deployment
Fix Steps:
# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Apply pending release migrations
stella system migrations-run --module Authority --category release
# 3. Verify migrations applied
stella system migrations-status --module Authority
Verification:
stella doctor --check check.database.migrations.pending
────────────────────────────────────────────────────────────────
Summary: 44 passed, 2 warnings, 1 failed (47 total)
Duration: 8.3s
────────────────────────────────────────────────────────────────
6. UI Surface
6.1 Route and Location
Route: /ops/doctor
Location: src/Web/StellaOps.Web/src/app/features/doctor/
6.2 Component Structure
src/app/features/doctor/
├── doctor.routes.ts
├── doctor-dashboard.component.ts # Main page
├── doctor-dashboard.component.html
├── doctor-dashboard.component.scss
├── components/
│ ├── check-list/
│ │ ├── check-list.component.ts # Filterable check list
│ │ └── check-list.component.html
│ ├── check-result/
│ │ ├── check-result.component.ts # Single check display
│ │ └── check-result.component.html
│ ├── remediation-panel/
│ │ ├── remediation-panel.component.ts # Fix commands display
│ │ └── remediation-panel.component.html
│ ├── evidence-viewer/
│ │ ├── evidence-viewer.component.ts # Collected evidence
│ │ └── evidence-viewer.component.html
│ └── export-dialog/
│ ├── export-dialog.component.ts # Export options
│ └── export-dialog.component.html
└── services/
├── doctor.client.ts # API client
├── doctor.service.ts # Business logic
└── doctor.store.ts # Signal-based state
6.3 Dashboard Layout
+------------------------------------------------------------------+
| Doctor Diagnostics [Run Quick] [Run Full] |
+------------------------------------------------------------------+
| Filters: [Category v] [Plugin v] [Severity v] [Export Report] |
+------------------------------------------------------------------+
| |
| Summary Strip |
| +----------+ +----------+ +----------+ +----------+ +----------+ |
| | 44 | | 2 | | 1 | | 0 | | 8.3s | |
| | Passed | | Warnings | | Failed | | Skipped | | Duration | |
| +----------+ +----------+ +----------+ +----------+ +----------+ |
| |
+------------------------------------------------------------------+
| Check Results |
| +----------------------------------------------------------------+ |
| | [FAIL] check.database.migrations.pending [Expand] | |
| | 3 pending release migrations in schema 'auth' | |
| +----------------------------------------------------------------+ |
| | [WARN] check.tls.certificates.expiry [Expand] | |
| | TLS certificate expires in 14 days | |
| +----------------------------------------------------------------+ |
| | [PASS] check.database.connectivity [Expand] | |
| | PostgreSQL connection successful (12ms) | |
| +----------------------------------------------------------------+ |
| | ... more checks ... | |
+------------------------------------------------------------------+
6.4 Expanded Check View
+------------------------------------------------------------------+
| [FAIL] check.database.migrations.pending |
+------------------------------------------------------------------+
| Diagnosis |
| 3 pending release migrations detected in schema 'auth' |
+------------------------------------------------------------------+
| Evidence |
| +--------------------------------------------------------------+ |
| | Schema | auth | |
| | Current version | 099_add_dpop_thumbprints | |
| | Pending | 100_add_tenant_quotas | |
| | | 101_add_audit_retention | |
| | | 102_add_session_revocation | |
| | Connection | postgres://localhost:5432/stellaops | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Likely Causes |
| 1. Release migrations not applied before deployment |
| 2. Migration files added after last deployment |
+------------------------------------------------------------------+
| Fix Steps [Copy All] |
| +--------------------------------------------------------------+ |
| | Step 1: Backup database first (RECOMMENDED) [Copy] | |
| | pg_dump -h localhost -U stella_admin -d stellaops -F c \ | |
| | -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump | |
| +--------------------------------------------------------------+ |
| | Step 2: Apply pending release migrations [Copy] | |
| | stella system migrations-run --module Authority \ | |
| | --category release | |
| +--------------------------------------------------------------+ |
| | Step 3: Verify migrations applied [Copy] | |
| | stella system migrations-status --module Authority | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
| Verification [Copy] |
| stella doctor --check check.database.migrations.pending |
+------------------------------------------------------------------+
| [Re-run Check] [Mark Resolved] |
+------------------------------------------------------------------+
6.5 Pack Navigation and Fix Actions
- Navigation hierarchy: packs -> plugins -> checks.
- Each check shows status, evidence, Copy Fix Commands, and Run Fix (disabled unless
doctor.fix.enabled=true). - Export actions: Download JSON and Download DSSE summary.
6.6 Real-Time Updates
- Polling: Auto-refresh option (every 30s/60s/5m)
- SSE: Live check progress during execution
- WebSocket: Optional for high-frequency updates
7. API Surface
7.1 Endpoints
Base Path: /api/v1/doctor
| Method | Path | Description |
|---|---|---|
GET |
/checks |
List available checks with metadata |
GET |
/plugins |
List available plugins |
POST |
/run |
Execute doctor checks |
GET |
/run/{runId} |
Get run status/results |
GET |
/run/{runId}/stream |
SSE stream for live progress |
GET |
/reports |
List historical reports |
GET |
/reports/{reportId} |
Get specific report |
DELETE |
/reports/{reportId} |
Delete report |
7.2 Request/Response Models
List Checks
GET /api/v1/doctor/checks?category=database&tags=quick
{
"checks": [
{
"checkId": "check.database.connectivity",
"name": "Database Connectivity",
"description": "Verify PostgreSQL connection",
"pluginId": "stellaops.doctor.database",
"category": "database",
"defaultSeverity": "fail",
"tags": ["quick", "database"],
"estimatedDurationMs": 500
}
],
"total": 47
}
Run Checks
POST /api/v1/doctor/run
Content-Type: application/json
{
"mode": "quick",
"categories": ["database", "integration"],
"plugins": [],
"checkIds": [],
"timeoutMs": 30000,
"parallelism": 4,
"includeRemediation": true
}
{
"runId": "dr_20260112_143052_abc123",
"status": "running",
"startedAt": "2026-01-12T14:30:52Z",
"checksTotal": 12,
"checksCompleted": 0
}
Get Run Results
GET /api/v1/doctor/run/dr_20260112_143052_abc123
{
"runId": "dr_20260112_143052_abc123",
"status": "completed",
"startedAt": "2026-01-12T14:30:52Z",
"completedAt": "2026-01-12T14:31:00Z",
"durationMs": 8300,
"summary": {
"passed": 44,
"warnings": 2,
"failed": 1,
"skipped": 0,
"total": 47
},
"overallSeverity": "fail",
"results": [
{
"checkId": "check.database.migrations.pending",
"pluginId": "stellaops.doctor.database",
"category": "database",
"severity": "fail",
"diagnosis": "3 pending release migrations detected in schema 'auth'",
"evidence": {
"description": "Migration state for auth schema",
"data": {
"schema": "auth",
"currentVersion": "099_add_dpop_thumbprints",
"pendingMigrations": "100_add_tenant_quotas, 101_add_audit_retention, 102_add_session_revocation",
"connection": "postgres://localhost:5432/stellaops"
}
},
"likelyCauses": [
"Release migrations not applied before deployment",
"Migration files added after last deployment"
],
"remediation": {
"requiresBackup": true,
"safetyNote": "Always backup before running migrations",
"steps": [
{
"order": 1,
"description": "Backup database first (RECOMMENDED)",
"command": "pg_dump -h localhost -U stella_admin -d stellaops -F c -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump",
"commandType": "shell",
"placeholders": {}
},
{
"order": 2,
"description": "Apply pending release migrations",
"command": "stella system migrations-run --module Authority --category release",
"commandType": "shell",
"placeholders": {}
},
{
"order": 3,
"description": "Verify migrations applied",
"command": "stella system migrations-status --module Authority",
"commandType": "shell",
"placeholders": {}
}
]
},
"verificationCommand": "stella doctor --check check.database.migrations.pending",
"durationMs": 234,
"executedAt": "2026-01-12T14:30:54Z"
}
]
}
Results also expose a how_to_fix object for automation. It is a simplified alias of
the richer remediation model and includes commands[] printed verbatim.
7.3 SSE Stream
GET /api/v1/doctor/run/dr_20260112_143052_abc123/stream
Accept: text/event-stream
event: check-started
data: {"checkId":"check.database.connectivity","startedAt":"2026-01-12T14:30:52Z"}
event: check-completed
data: {"checkId":"check.database.connectivity","severity":"pass","durationMs":45}
event: check-started
data: {"checkId":"check.database.migrations.pending","startedAt":"2026-01-12T14:30:52Z"}
event: check-completed
data: {"checkId":"check.database.migrations.pending","severity":"fail","durationMs":234}
event: run-completed
data: {"runId":"dr_20260112_143052_abc123","summary":{"passed":44,"warnings":2,"failed":1}}
7.4 Evidence Logs and Attestations
Doctor runs emit a JSONL evidence log and optional DSSE summary for audit trails. By default, JSONL is local only and deterministic; outbound telemetry is opt-in.
- JSONL path:
artifacts/doctor/doctor-run-<runId>.ndjson(configurable). - DSSE summary:
artifacts/doctor/doctor-run-<runId>.dsse.json(optional). - Evidence records include
doctor_commandto capture the operator-invoked command. DSSE summaries assume operator execution and must include the same command note.
Example JSONL line:
{"runId":"dr_20260112_143052_abc123","doctor_command":"stella doctor run --format json","checkId":"check.database.connectivity","severity":"pass","executedAt":"2026-01-12T14:30:52Z","how_to_fix":{"commands":[]}}
8. Remediation Command Patterns
Remediation should favor the best operator experience: short, copy/paste friendly commands with minimal steps and clear verification guidance.
8.1 Standard Output Format
Every failed check produces remediation in this structure:
[{SEVERITY}] {check.id}
Diagnosis: {one-line summary}
Evidence:
{key}: {value}
{key}: {value}
...
Likely Causes:
1. {most likely cause}
2. {second most likely cause}
...
Fix Steps:
# {step number}. {description}
{command}
# {step number}. {description}
{command}
...
Verification:
{command to re-run this specific check}
8.1.1 JSON Remediation Structure
The JSON output MUST include a how_to_fix object for agent consumption. It can be
derived from remediation.steps and preserves command order.
"how_to_fix": {
"summary": "Apply baseline branch policy",
"commands": [
"stella orchestrator scm apply-branch-policy --preset strict"
]
}
8.2 Placeholder Conventions
When commands require user-specific values:
| Placeholder | Meaning | Example |
|---|---|---|
{HOSTNAME} |
Target hostname | ldap.example.com |
{PORT} |
Port number | 636 |
{USERNAME} |
Username | admin |
{PASSWORD} |
Password (never shown) | *** |
{DATABASE} |
Database name | stellaops |
{SCHEMA} |
Schema name | auth |
{FILE_PATH} |
File path | /etc/ssl/certs/ca.crt |
{TOKEN} |
API token (never shown) | *** |
{URL} |
Full URL | https://api.github.com |
8.3 Safety Notes
Doctor fix executes only non-destructive commands. If a fix requires a change that modifies data, Doctor must present it as manual guidance with explicit safety notes and never execute it.
Manual Steps (not executed by Doctor):
# SAFETY: This operation modifies the database. Create a backup first.
# 1. Backup database (REQUIRED before proceeding)
pg_dump -h {HOSTNAME} -U {USERNAME} -d {DATABASE} -F c \
-f backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Apply the fix
stella system migrations-run --module Authority --category release
8.4 Multi-Platform Commands
Where applicable, provide commands for different platforms:
Fix Steps:
# 1. Restart the service
# Linux (systemd):
sudo systemctl restart stellaops-gateway
# Linux (Docker):
docker restart stellaops-gateway
# Docker Compose:
docker compose restart gateway
# Kubernetes:
kubectl rollout restart deployment/stellaops-gateway -n stellaops
9. Doctor Check Catalog
This section documents all diagnostic checks organized by plugin/category.
9.1 Core Platform Plugin (stellaops.doctor.core)
check.config.required
| Property | Value |
|---|---|
| CheckId | check.config.required |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Fail |
| Tags | quick, config, startup |
| What it verifies | All required configuration values are present |
| Evidence collected | Missing keys, config sources checked, environment |
| Failure modes | Missing STELLAOPS_BACKEND_URL, missing database connection string, missing Authority URL |
Remediation:
# 1. Check which configuration values are missing
stella config list --show-missing
# 2. Set missing environment variables
export STELLAOPS_BACKEND_URL="https://api.stellaops.example.com"
export STELLAOPS_POSTGRES_CONNECTION="Host=localhost;Database=stellaops;Username=stella_app;Password={PASSWORD}"
export STELLAOPS_AUTHORITY_URL="https://auth.stellaops.example.com"
# 3. Or update configuration file
# Edit: /etc/stellaops/stellaops.yaml
Verification: stella doctor --check check.config.required
check.config.syntax
| Property | Value |
|---|---|
| CheckId | check.config.syntax |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Fail |
| Tags | quick, config |
| What it verifies | Configuration files have valid YAML/JSON syntax |
| Evidence collected | File path, line number, parse error message |
| Failure modes | Invalid YAML indentation, JSON syntax error, encoding issues |
Remediation:
# 1. Validate YAML syntax
yamllint /etc/stellaops/stellaops.yaml
# 2. Check for encoding issues (should be UTF-8)
file /etc/stellaops/stellaops.yaml
# 3. Fix common YAML issues
# - Use spaces, not tabs
# - Check string quoting
# - Verify indentation (2 spaces per level)
Verification: stella doctor --check check.config.syntax
check.config.deprecated
| Property | Value |
|---|---|
| CheckId | check.config.deprecated |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Warn |
| Tags | config |
| What it verifies | No deprecated configuration keys are in use |
| Evidence collected | Deprecated keys found, replacement keys |
| Failure modes | Using old key names, removed options |
Remediation:
# 1. Review deprecated keys and their replacements
stella config migrate --dry-run
# 2. Update configuration file with new key names
stella config migrate --apply
# 3. Verify configuration after migration
stella config validate
Verification: stella doctor --check check.config.deprecated
check.runtime.dotnet
| Property | Value |
|---|---|
| CheckId | check.runtime.dotnet |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Fail |
| Tags | quick, runtime |
| What it verifies | .NET runtime version meets minimum requirements |
| Evidence collected | Installed version, required version, runtime path |
| Failure modes | Outdated .NET version, missing runtime |
Remediation:
# 1. Check current .NET version
dotnet --version
# 2. Install required .NET version (Ubuntu/Debian)
wget https://dot.net/v1/dotnet-install.sh
chmod +x dotnet-install.sh
./dotnet-install.sh --channel 10.0
# 3. Verify installation
dotnet --list-runtimes
Verification: stella doctor --check check.runtime.dotnet
check.runtime.memory
| Property | Value |
|---|---|
| CheckId | check.runtime.memory |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Warn |
| Tags | runtime, resources |
| What it verifies | Sufficient memory available for operation |
| Evidence collected | Total memory, available memory, GC memory info |
| Failure modes | Low available memory (<1GB), high GC pressure |
Remediation:
# 1. Check current memory usage
free -h
# 2. Identify memory-heavy processes
ps aux --sort=-%mem | head -20
# 3. Adjust container memory limits if applicable
# Docker:
docker update --memory 4g stellaops-gateway
# Kubernetes:
kubectl patch deployment stellaops-gateway -p '{"spec":{"template":{"spec":{"containers":[{"name":"gateway","resources":{"limits":{"memory":"4Gi"}}}]}}}}'
Verification: stella doctor --check check.runtime.memory
check.runtime.disk.space
| Property | Value |
|---|---|
| CheckId | check.runtime.disk.space |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Warn |
| Tags | runtime, resources |
| What it verifies | Sufficient disk space on required paths |
| Evidence collected | Path, total space, available space, usage percentage |
| Failure modes | Data directory >90% full, log directory full |
Remediation:
# 1. Check disk usage
df -h /var/lib/stellaops
# 2. Find large files
du -sh /var/lib/stellaops/* | sort -hr | head -20
# 3. Clean up old logs
find /var/log/stellaops -name "*.log" -mtime +30 -delete
# 4. Clean up old exports
stella export cleanup --older-than 30d
Verification: stella doctor --check check.runtime.disk.space
check.runtime.disk.permissions
| Property | Value |
|---|---|
| CheckId | check.runtime.disk.permissions |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Fail |
| Tags | quick, runtime, security |
| What it verifies | Write permissions on required directories |
| Evidence collected | Path, expected permissions, actual permissions, owner |
| Failure modes | Cannot write to data directory, log directory not writable |
Remediation:
# 1. Check current permissions
ls -la /var/lib/stellaops
# 2. Fix ownership
sudo chown -R stellaops:stellaops /var/lib/stellaops
# 3. Fix permissions
sudo chmod 755 /var/lib/stellaops
sudo chmod 755 /var/log/stellaops
# 4. Verify write access
sudo -u stellaops touch /var/lib/stellaops/.write-test && rm /var/lib/stellaops/.write-test
Verification: stella doctor --check check.runtime.disk.permissions
check.time.sync
| Property | Value |
|---|---|
| CheckId | check.time.sync |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Warn |
| Tags | quick, runtime |
| What it verifies | System clock is synchronized (NTP) |
| Evidence collected | NTP status, clock offset, sync source |
| Failure modes | Clock drift >5s, NTP not running, no sync source |
Remediation:
# 1. Check NTP status
timedatectl status
# 2. Enable NTP synchronization
sudo timedatectl set-ntp true
# 3. Force immediate sync
sudo systemctl restart systemd-timesyncd
# 4. Verify sync status
timedatectl timesync-status
Verification: stella doctor --check check.time.sync
check.crypto.profiles
| Property | Value |
|---|---|
| CheckId | check.crypto.profiles |
| Plugin | stellaops.doctor.core |
| Category | Core |
| Severity | Fail |
| Tags | quick, security, crypto |
| What it verifies | Crypto profile is valid and providers are available |
| Evidence collected | Active profile, available providers, missing providers |
| Failure modes | Invalid profile, required provider not available |
Remediation:
# 1. List available crypto profiles
stella crypto profiles list
# 2. Validate current profile
stella crypto profiles validate
# 3. Switch to a different profile if needed
stella crypto profiles set --profile default
# 4. Install missing providers (if GOST required)
# See docs/crypto/gost-setup.md
Verification: stella doctor --check check.crypto.profiles
9.2 Database Plugin (stellaops.doctor.database)
check.database.connectivity
| Property | Value |
|---|---|
| CheckId | check.database.connectivity |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Fail |
| Tags | quick, database |
| What it verifies | PostgreSQL connection is successful |
| Evidence collected | Connection string (redacted), latency, server version |
| Failure modes | Connection refused, authentication failed, timeout |
Remediation:
# 1. Test connection manually
psql "host=localhost dbname=stellaops user=stella_app" -c "SELECT 1"
# 2. Check PostgreSQL is running
sudo systemctl status postgresql
# 3. Check connection settings
# Verify pg_hba.conf allows connections
sudo cat /etc/postgresql/16/main/pg_hba.conf | grep stellaops
# 4. Check firewall
sudo ufw status | grep 5432
Verification: stella doctor --check check.database.connectivity
check.database.version
| Property | Value |
|---|---|
| CheckId | check.database.version |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Warn |
| Tags | database |
| What it verifies | PostgreSQL version meets minimum requirements (>=16) |
| Evidence collected | Current version, required version |
| Failure modes | PostgreSQL <16, unsupported version |
Remediation:
# 1. Check current version
psql -c "SELECT version();"
# 2. Upgrade PostgreSQL (Ubuntu)
sudo apt install postgresql-16
# 3. Migrate data to new version
sudo pg_upgradecluster 14 main
# 4. Remove old version
sudo apt remove postgresql-14
Verification: stella doctor --check check.database.version
check.database.migrations.pending
| Property | Value |
|---|---|
| CheckId | check.database.migrations.pending |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Fail |
| Tags | database, migrations |
| What it verifies | No pending release migrations exist |
| Evidence collected | Schema, current version, pending migrations list |
| Failure modes | Release migrations not applied before deployment |
Remediation:
# 1. Backup database first (RECOMMENDED)
pg_dump -h localhost -U stella_admin -d stellaops -F c \
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
# 2. Check migration status for all modules
stella system migrations-status
# 3. Apply pending release migrations
stella system migrations-run --category release
# 4. Verify all migrations applied
stella system migrations-status --verify
Verification: stella doctor --check check.database.migrations.pending
check.database.migrations.checksum
| Property | Value |
|---|---|
| CheckId | check.database.migrations.checksum |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Fail |
| Tags | database, migrations, security |
| What it verifies | Applied migration checksums match source files |
| Evidence collected | Mismatched migrations, expected vs actual checksum |
| Failure modes | Migration file modified after application, corruption |
Remediation:
# CRITICAL: Checksum mismatch indicates potential data integrity issue
# 1. Identify mismatched migrations
stella system migrations-verify --detailed
# 2. If migrations were legitimately modified (rare):
# WARNING: Only proceed if you understand the implications
stella system migrations-repair --migration {MIGRATION_NAME} --force
# 3. If data corruption suspected:
# Restore from backup and reapply migrations
pg_restore -h localhost -U stella_admin -d stellaops stellaops_backup.dump
stella system migrations-run --all
Verification: stella doctor --check check.database.migrations.checksum
check.database.migrations.lock
| Property | Value |
|---|---|
| CheckId | check.database.migrations.lock |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Warn |
| Tags | database, migrations |
| What it verifies | No stale migration locks exist |
| Evidence collected | Lock holder, lock duration, schema |
| Failure modes | Abandoned lock from crashed process |
Remediation:
# 1. Check for active locks
psql -d stellaops -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';"
# 2. Identify lock holder process
psql -d stellaops -c "SELECT pid, query, state FROM pg_stat_activity WHERE pid IN (SELECT pid FROM pg_locks WHERE locktype = 'advisory');"
# 3. If process is dead, clear the lock
# WARNING: Only if you are certain no migration is running
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"
# 4. Retry migration
stella system migrations-run --category release
Verification: stella doctor --check check.database.migrations.lock
check.database.schema.{schema}
| Property | Value |
|---|---|
| CheckId | check.database.schema.{schema} (e.g., check.database.schema.auth) |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Fail |
| Tags | database |
| What it verifies | Schema exists and has expected tables |
| Evidence collected | Schema name, expected tables, missing tables |
| Failure modes | Schema not created, tables dropped |
Remediation:
# 1. Check if schema exists
psql -d stellaops -c "SELECT schema_name FROM information_schema.schemata WHERE schema_name = '{SCHEMA}';"
# 2. If schema missing, run startup migrations
stella system migrations-run --module {MODULE} --category startup
# 3. Verify schema tables
psql -d stellaops -c "SELECT table_name FROM information_schema.tables WHERE table_schema = '{SCHEMA}';"
Verification: stella doctor --check check.database.schema.{schema}
check.database.connections.pool
| Property | Value |
|---|---|
| CheckId | check.database.connections.pool |
| Plugin | stellaops.doctor.database |
| Category | Database |
| Severity | Warn |
| Tags | database, performance |
| What it verifies | Connection pool is healthy, not exhausted |
| Evidence collected | Active connections, idle connections, max connections |
| Failure modes | Pool exhausted, connection leak |
Remediation:
# 1. Check current connections
psql -d stellaops -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'stellaops';"
# 2. Check max connections
psql -d stellaops -c "SHOW max_connections;"
# 3. Identify long-running queries
psql -d stellaops -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10;"
# 4. Increase max connections if needed
# Edit postgresql.conf: max_connections = 200
sudo systemctl reload postgresql
Verification: stella doctor --check check.database.connections.pool
9.3 Service Graph Plugin (stellaops.doctor.servicegraph)
check.services.gateway.running
| Property | Value |
|---|---|
| CheckId | check.services.gateway.running |
| Plugin | stellaops.doctor.servicegraph |
| Category | ServiceGraph |
| Severity | Fail |
| Tags | quick, services |
| What it verifies | Gateway service is running and accepting connections |
| Evidence collected | Service status, PID, uptime, port binding |
| Failure modes | Service not running, port already in use |
Remediation:
# 1. Check service status
sudo systemctl status stellaops-gateway
# 2. Check logs for errors
sudo journalctl -u stellaops-gateway -n 50
# 3. Check port binding
sudo ss -tlnp | grep 443
# 4. Start/restart service
sudo systemctl restart stellaops-gateway
Verification: stella doctor --check check.services.gateway.running
check.services.gateway.routing
| Property | Value |
|---|---|
| CheckId | check.services.gateway.routing |
| Plugin | stellaops.doctor.servicegraph |
| Category | ServiceGraph |
| Severity | Fail |
| Tags | services, routing |
| What it verifies | Gateway can route requests to backend services |
| Evidence collected | Registered services, routing table, disconnected services |
| Failure modes | No services registered, all services disconnected |
Remediation:
# 1. Check registered services
curl -s http://localhost:8080/health/routing | jq
# 2. Verify backend services are running
stella services status
# 3. Check Router transport connectivity
stella services connectivity-test
# 4. Restart disconnected services
sudo systemctl restart stellaops-concelier
sudo systemctl restart stellaops-scanner
Verification: stella doctor --check check.services.gateway.routing
check.services.{service}.health
| Property | Value |
|---|---|
| CheckId | check.services.{service}.health (e.g., check.services.concelier.health) |
| Plugin | stellaops.doctor.servicegraph |
| Category | ServiceGraph |
| Severity | Fail |
| Tags | services |
| What it verifies | Service health endpoint returns healthy |
| Evidence collected | Health status, dependencies, latency |
| Failure modes | Service unhealthy, degraded dependencies |
Remediation:
# 1. Check service health directly
curl -s http://localhost:{PORT}/healthz | jq
# 2. Check detailed health
curl -s http://localhost:{PORT}/health/details | jq
# 3. Check service logs
sudo journalctl -u stellaops-{SERVICE} -n 100
# 4. Restart service if needed
sudo systemctl restart stellaops-{SERVICE}
Verification: stella doctor --check check.services.{service}.health
check.services.{service}.connectivity
| Property | Value |
|---|---|
| CheckId | check.services.{service}.connectivity |
| Plugin | stellaops.doctor.servicegraph |
| Category | ServiceGraph |
| Severity | Fail |
| Tags | services, routing |
| What it verifies | Service is reachable from Gateway via Router |
| Evidence collected | Transport type, connection state, last heartbeat |
| Failure modes | Connection refused, heartbeat timeout |
Remediation:
# 1. Check Router connection status
stella services connection-status --service {SERVICE}
# 2. Test network connectivity
nc -zv {SERVICE_HOST} {SERVICE_PORT}
# 3. Check firewall rules
sudo ufw status | grep {SERVICE_PORT}
# 4. Verify Router configuration in service
# Check stellaops.yaml for correct Router endpoints
Verification: stella doctor --check check.services.{service}.connectivity
check.services.authority.connectivity
| Property | Value |
|---|---|
| CheckId | check.services.authority.connectivity |
| Plugin | stellaops.doctor.servicegraph |
| Category | ServiceGraph |
| Severity | Fail |
| Tags | quick, services, auth |
| What it verifies | Authority service is reachable |
| Evidence collected | Authority URL, response status, latency |
| Failure modes | Authority unreachable, OIDC discovery failed |
Remediation:
# 1. Check Authority URL configuration
echo $STELLAOPS_AUTHORITY_URL
# 2. Test OIDC discovery endpoint
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq
# 3. Check Authority service status
sudo systemctl status stellaops-authority
# 4. Verify network connectivity
curl -v ${STELLAOPS_AUTHORITY_URL}/healthz
Verification: stella doctor --check check.services.authority.connectivity
9.4 Security Plugin (stellaops.doctor.security)
check.auth.oidc.discovery
| Property | Value |
|---|---|
| CheckId | check.auth.oidc.discovery |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | quick, auth, security |
| What it verifies | OIDC well-known endpoint is accessible |
| Evidence collected | Discovery URL, issuer, supported flows |
| Failure modes | Discovery endpoint unavailable, invalid response |
Remediation:
# 1. Test discovery endpoint
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq
# 2. Verify issuer matches configuration
# The issuer in the response should match STELLAOPS_AUTHORITY_URL
# 3. Check Authority service logs
sudo journalctl -u stellaops-authority -n 50
# 4. Verify TLS certificate
openssl s_client -connect auth.stellaops.example.com:443 -servername auth.stellaops.example.com
Verification: stella doctor --check check.auth.oidc.discovery
check.auth.oidc.jwks
| Property | Value |
|---|---|
| CheckId | check.auth.oidc.jwks |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | auth, security |
| What it verifies | JWKS endpoint returns valid signing keys |
| Evidence collected | JWKS URL, key count, key algorithms |
| Failure modes | JWKS unavailable, no keys, unsupported algorithms |
Remediation:
# 1. Fetch JWKS directly
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/jwks.json | jq
# 2. Verify keys are present
# Response should contain at least one key in "keys" array
# 3. If JWKS is empty, regenerate signing keys
stella authority keys rotate
# 4. Restart Authority service
sudo systemctl restart stellaops-authority
Verification: stella doctor --check check.auth.oidc.jwks
check.auth.ldap.bind
| Property | Value |
|---|---|
| CheckId | check.auth.ldap.bind |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | auth, security, ldap |
| What it verifies | LDAP bind credentials are valid |
| Evidence collected | LDAP host, bind DN (redacted), TLS status |
| Failure modes | Invalid credentials, connection refused, TLS failure |
Remediation:
# 1. Test LDAP connection with ldapsearch
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "cn=bind-user,ou=service,dc=example,dc=internal" \
-w "{PASSWORD}" \
-b "ou=people,dc=example,dc=internal" "(uid=*)" dn | head -10
# 2. Check TLS certificate
openssl s_client -connect {LDAP_HOST}:636 -showcerts
# 3. Verify bind DN and password in configuration
# Check etc/authority.plugins/ldap.yaml
# 4. Test with Authority's ldap-test command
stella authority ldap-test --bind-only
Verification: stella doctor --check check.auth.ldap.bind
check.auth.ldap.search
| Property | Value |
|---|---|
| CheckId | check.auth.ldap.search |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | auth, ldap |
| What it verifies | LDAP search base is accessible and returns users |
| Evidence collected | Search base, user count, search time |
| Failure modes | Search base not found, no users returned, timeout |
Remediation:
# 1. Test LDAP search
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "{BIND_DN}" -w "{PASSWORD}" \
-b "{SEARCH_BASE}" "(objectClass=person)" dn | wc -l
# 2. Verify search base in configuration
# Check etc/authority.plugins/ldap.yaml: connection.searchBase
# 3. Check if search base exists
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "{BIND_DN}" -w "{PASSWORD}" \
-b "" -s base "(objectClass=*)"
# 4. Verify bind user has read permissions
# Check LDAP ACLs
Verification: stella doctor --check check.auth.ldap.search
check.auth.ldap.groups
| Property | Value |
|---|---|
| CheckId | check.auth.ldap.groups |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Warn |
| Tags | auth, ldap |
| What it verifies | LDAP group mapping is configured and working |
| Evidence collected | Group attribute, mapped groups, sample user groups |
| Failure modes | Group attribute not found, no groups mapped |
Remediation:
# 1. Check group attribute configuration
# etc/authority.plugins/ldap.yaml: claims.groupAttribute
# 2. Test group lookup for a sample user
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
-D "{BIND_DN}" -w "{PASSWORD}" \
-b "{SEARCH_BASE}" "(uid={TEST_USER})" memberOf
# 3. Verify group mapping in Authority
stella authority ldap-test --user {TEST_USER} --show-groups
# 4. Update group attribute if needed
# Common attributes: memberOf, member, groupMembership
Verification: stella doctor --check check.auth.ldap.groups
check.tls.certificates.expiry
| Property | Value |
|---|---|
| CheckId | check.tls.certificates.expiry |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Warn (30d), Fail (7d) |
| Tags | quick, security, tls |
| What it verifies | TLS certificates are not expiring soon |
| Evidence collected | Certificate path, subject, expiry date, days remaining |
| Failure modes | Certificate expired, expiring within threshold |
Remediation:
# 1. Check certificate expiry
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -enddate
# 2. Renew with certbot (if using Let's Encrypt)
sudo certbot renew --cert-name stellaops.example.com
# 3. Renew manually (if self-signed or enterprise CA)
# Generate new CSR
openssl req -new -key /etc/ssl/private/stellaops.key \
-out /tmp/stellaops.csr -subj "/CN=stellaops.example.com"
# Submit CSR to CA and install new certificate
# 4. Restart services to pick up new certificate
sudo systemctl restart stellaops-gateway
Verification: stella doctor --check check.tls.certificates.expiry
check.tls.certificates.chain
| Property | Value |
|---|---|
| CheckId | check.tls.certificates.chain |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | security, tls |
| What it verifies | TLS certificate chain is complete and valid |
| Evidence collected | Certificate chain, validation errors |
| Failure modes | Missing intermediate, self-signed not trusted, chain broken |
Remediation:
# 1. Verify certificate chain
openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt \
/etc/ssl/certs/stellaops.crt
# 2. Check chain with openssl
openssl s_client -connect stellaops.example.com:443 \
-servername stellaops.example.com -showcerts
# 3. Download missing intermediate certificates
# From your CA's website
# 4. Concatenate certificates in correct order
cat stellaops.crt intermediate.crt > stellaops-fullchain.crt
Verification: stella doctor --check check.tls.certificates.chain
check.secrets.vault.connectivity
| Property | Value |
|---|---|
| CheckId | check.secrets.vault.connectivity |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | security, vault |
| What it verifies | Vault service is reachable |
| Evidence collected | Vault address, seal status, version |
| Failure modes | Vault unreachable, sealed, version mismatch |
Remediation:
# 1. Check Vault status
vault status
# 2. If sealed, unseal Vault
vault operator unseal {UNSEAL_KEY_1}
vault operator unseal {UNSEAL_KEY_2}
vault operator unseal {UNSEAL_KEY_3}
# 3. Check network connectivity
curl -s ${VAULT_ADDR}/v1/sys/health | jq
# 4. Verify VAULT_ADDR environment variable
echo $VAULT_ADDR
Verification: stella doctor --check check.secrets.vault.connectivity
check.secrets.vault.auth
| Property | Value |
|---|---|
| CheckId | check.secrets.vault.auth |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | security, vault |
| What it verifies | Vault authentication is successful |
| Evidence collected | Auth method, token TTL, policies |
| Failure modes | Invalid token, expired token, wrong auth method |
Remediation:
# 1. Check current token
vault token lookup
# 2. If token expired, authenticate again
# Token auth:
vault login {TOKEN}
# AppRole auth:
vault write auth/approle/login role_id={ROLE_ID} secret_id={SECRET_ID}
# Kubernetes auth:
vault write auth/kubernetes/login role=stellaops jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token
# 3. Verify authentication worked
vault token lookup
Verification: stella doctor --check check.secrets.vault.auth
check.secrets.vault.paths
| Property | Value |
|---|---|
| CheckId | check.secrets.vault.paths |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | security, vault |
| What it verifies | Required secret paths are accessible |
| Evidence collected | Checked paths, accessible paths, denied paths |
| Failure modes | Permission denied, path not found |
Remediation:
# 1. Test reading required secrets
vault kv get secret/data/stellaops/api-keys
# 2. Check policy permissions
vault token lookup -format=json | jq '.data.policies'
# 3. Review policy rules
vault policy read stellaops
# 4. Update policy if needed
vault policy write stellaops - <<EOF
path "secret/data/stellaops/*" {
capabilities = ["read", "list"]
}
EOF
Verification: stella doctor --check check.secrets.vault.paths
check.security.evidence.integrity
| Property | Value |
|---|---|
| CheckId | check.security.evidence.integrity |
| Plugin | stellaops.doctor.security |
| Category | Security |
| Severity | Fail |
| Tags | security, evidence, integrity, dsse, rekor, offline |
| What it verifies | Evidence files have valid DSSE signatures, Rekor inclusion proofs, and consistent hashes |
| Evidence collected | Evidence locker path, total files, valid/invalid/skipped counts, specific issues |
| Failure modes | Empty DSSE payload, missing signatures, invalid base64, missing Rekor UUID, missing inclusion proof hashes, digest mismatch |
What it checks:
- DSSE Envelope Structure: Validates
payloadType,payload(base64), andsignaturesarray - Signature Completeness: Each signature has
keyidand valid base64sig - Payload Digest Consistency: If
payloadDigestfield present, recomputes and compares SHA-256 - Evidence Bundle Structure: Validates
bundleId,manifest.version, and optionalcontentDigest - Rekor Receipt Validity: If present, validates
uuid,logIndex, andinclusionProof.hashes
Remediation:
# 1. List evidence files with issues
stella doctor --check check.security.evidence.integrity --output json \
| jq '.evidence.issues[]'
# 2. Re-sign affected evidence bundles
stella evidence resign --bundle-id {BUNDLE_ID}
# 3. Verify Rekor inclusion manually (if online)
rekor-cli get --uuid {REKOR_UUID} --format json | jq
# 4. For offline environments, verify against local ledger
stella evidence verify --offline --bundle-id {BUNDLE_ID}
# 5. Re-generate evidence pack from source
stella export evidence-pack --artifact {ARTIFACT_DIGEST} --force
Configuration:
# etc/appsettings.yaml
EvidenceLocker:
LocalPath: /var/lib/stellaops/evidence
# Or use Evidence:BasePath for alternate key
Verification: stella doctor --check check.security.evidence.integrity
9.5 Integration Plugins - SCM (stellaops.doctor.integration.scm.*)
check.integration.scm.github.connectivity
| Property | Value |
|---|---|
| CheckId | check.integration.scm.github.connectivity |
| Plugin | stellaops.doctor.integration.scm.github |
| Category | Integration |
| Severity | Fail |
| Tags | integration, scm, github |
| What it verifies | GitHub API is reachable |
| Evidence collected | API endpoint, response status, latency |
| Failure modes | API unreachable, DNS resolution failed, TLS error |
Remediation:
# 1. Test GitHub API connectivity
curl -s https://api.github.com/zen
# 2. Check DNS resolution
nslookup api.github.com
# 3. Test with authentication
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user
# 4. Check proxy settings if behind firewall
echo $HTTPS_PROXY
Verification: stella doctor --check check.integration.scm.github.connectivity
check.integration.scm.github.auth
| Property | Value |
|---|---|
| CheckId | check.integration.scm.github.auth |
| Plugin | stellaops.doctor.integration.scm.github |
| Category | Integration |
| Severity | Fail |
| Tags | integration, scm, github, auth |
| What it verifies | GitHub authentication is successful |
| Evidence collected | Auth type (PAT/App/OAuth), user/app info |
| Failure modes | Invalid token, expired token, wrong app credentials |
Remediation:
# For Personal Access Token:
# 1. Verify token is valid
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | jq '.login'
# 2. Generate new token if expired
# Visit: https://github.com/settings/tokens
# For GitHub App:
# 1. Check app installation
curl -s -H "Authorization: Bearer {JWT}" \
-H "Accept: application/vnd.github+json" \
https://api.github.com/app
# 2. Verify app is installed on repository
curl -s -H "Authorization: Bearer {INSTALLATION_TOKEN}" \
https://api.github.com/installation/repositories
Verification: stella doctor --check check.integration.scm.github.auth
check.integration.scm.github.permissions
| Property | Value |
|---|---|
| CheckId | check.integration.scm.github.permissions |
| Plugin | stellaops.doctor.integration.scm.github |
| Category | Integration |
| Severity | Fail |
| Tags | integration, scm, github |
| What it verifies | Token/App has required scopes/permissions |
| Evidence collected | Current scopes, required scopes, missing scopes |
| Failure modes | Missing repo scope, missing write:packages |
Remediation:
# 1. Check current token scopes
curl -sI -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | grep x-oauth-scopes
# Required scopes for Stella Ops:
# - repo (full repository access)
# - read:org (organization membership)
# - write:packages (container registry)
# 2. Generate new token with correct scopes
# Visit: https://github.com/settings/tokens/new
# Select: repo, read:org, write:packages
# 3. Update token in Stella Ops
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}
Verification: stella doctor --check check.integration.scm.github.permissions
check.integration.scm.github.ratelimit
| Property | Value |
|---|---|
| CheckId | check.integration.scm.github.ratelimit |
| Plugin | stellaops.doctor.integration.scm.github |
| Category | Integration |
| Severity | Warn |
| Tags | integration, scm, github |
| What it verifies | GitHub API rate limit is not exhausted |
| Evidence collected | Limit, remaining, reset time |
| Failure modes | Rate limit exhausted, near threshold |
Remediation:
# 1. Check current rate limit status
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit | jq
# 2. If exhausted, wait for reset
# The "reset" field shows Unix timestamp when limit resets
# 3. Consider using GitHub App instead of PAT for higher limits
# PAT: 5000 requests/hour
# GitHub App: 15000 requests/hour per installation
# 4. Implement request caching in your application
Verification: stella doctor --check check.integration.scm.github.ratelimit
check.integration.scm.gitlab.connectivity
| Property | Value |
|---|---|
| CheckId | check.integration.scm.gitlab.connectivity |
| Plugin | stellaops.doctor.integration.scm.gitlab |
| Category | Integration |
| Severity | Fail |
| Tags | integration, scm, gitlab |
| What it verifies | GitLab API is reachable |
| Evidence collected | API endpoint, response status, version |
| Failure modes | API unreachable, self-hosted instance down |
Remediation:
# 1. Test GitLab API connectivity
curl -s https://{GITLAB_HOST}/api/v4/version
# 2. For self-hosted GitLab, check service status
sudo gitlab-ctl status
# 3. Check firewall/proxy
curl -v https://{GITLAB_HOST}/api/v4/version
# 4. Verify URL configuration
stella integrations show --id {INTEGRATION_ID}
Verification: stella doctor --check check.integration.scm.gitlab.connectivity
check.integration.scm.gitlab.auth
| Property | Value |
|---|---|
| CheckId | check.integration.scm.gitlab.auth |
| Plugin | stellaops.doctor.integration.scm.gitlab |
| Category | Integration |
| Severity | Fail |
| Tags | integration, scm, gitlab, auth |
| What it verifies | GitLab authentication is successful |
| Evidence collected | Auth type, user info, token expiry |
| Failure modes | Invalid token, expired token, revoked access |
Remediation:
# 1. Test token authentication
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/user | jq '.username'
# 2. Check token expiry
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/personal_access_tokens/self | jq '.expires_at'
# 3. Generate new token if expired
# Visit: https://{GITLAB_HOST}/-/profile/personal_access_tokens
# 4. Update token in Stella Ops
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}
Verification: stella doctor --check check.integration.scm.gitlab.auth
9.6 Integration Plugins - Registry (stellaops.doctor.integration.registry.*)
check.integration.registry.harbor.connectivity
| Property | Value |
|---|---|
| CheckId | check.integration.registry.harbor.connectivity |
| Plugin | stellaops.doctor.integration.registry.harbor |
| Category | Integration |
| Severity | Fail |
| Tags | integration, registry, harbor |
| What it verifies | Harbor registry is reachable |
| Evidence collected | Registry URL, health status, version |
| Failure modes | Registry unreachable, components unhealthy |
Remediation:
# 1. Check Harbor health endpoint
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq
# 2. Check individual components
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq '.components'
# 3. For self-hosted Harbor, check services
docker compose -f /opt/harbor/docker-compose.yml ps
# 4. Check Harbor logs
docker compose -f /opt/harbor/docker-compose.yml logs --tail=50 core
Verification: stella doctor --check check.integration.registry.harbor.connectivity
check.integration.registry.harbor.auth
| Property | Value |
|---|---|
| CheckId | check.integration.registry.harbor.auth |
| Plugin | stellaops.doctor.integration.registry.harbor |
| Category | Integration |
| Severity | Fail |
| Tags | integration, registry, harbor, auth |
| What it verifies | Harbor authentication is successful |
| Evidence collected | Auth type, user info, project access |
| Failure modes | Invalid credentials, LDAP sync issue |
Remediation:
# 1. Test Docker login
docker login {HARBOR_HOST} -u {USERNAME} -p {PASSWORD}
# 2. Test API authentication
curl -s -u {USERNAME}:{PASSWORD} https://{HARBOR_HOST}/api/v2.0/users/current | jq
# 3. Check if user exists
curl -s -u admin:{ADMIN_PASSWORD} https://{HARBOR_HOST}/api/v2.0/users?username={USERNAME} | jq
# 4. Reset password if needed
# Via Harbor UI: https://{HARBOR_HOST}/harbor/users
Verification: stella doctor --check check.integration.registry.harbor.auth
check.integration.registry.harbor.pull
| Property | Value |
|---|---|
| CheckId | check.integration.registry.harbor.pull |
| Plugin | stellaops.doctor.integration.registry.harbor |
| Category | Integration |
| Severity | Fail |
| Tags | integration, registry, harbor |
| What it verifies | Can pull images from configured repositories |
| Evidence collected | Test image, pull result, error message |
| Failure modes | Permission denied, repository not found |
Remediation:
# 1. Test image pull
docker pull {HARBOR_HOST}/{PROJECT}/{IMAGE}:{TAG}
# 2. Check project membership
curl -s -u {USERNAME}:{PASSWORD} \
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members | jq
# 3. Add user to project if needed
curl -X POST -u admin:{ADMIN_PASSWORD} \
-H "Content-Type: application/json" \
-d '{"role_id": 2, "member_user": {"username": "{USERNAME}"}}' \
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members
# 4. Verify repository exists
curl -s -u {USERNAME}:{PASSWORD} \
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/repositories | jq
Verification: stella doctor --check check.integration.registry.harbor.pull
check.integration.registry.ecr.connectivity
| Property | Value |
|---|---|
| CheckId | check.integration.registry.ecr.connectivity |
| Plugin | stellaops.doctor.integration.registry.ecr |
| Category | Integration |
| Severity | Fail |
| Tags | integration, registry, ecr, aws |
| What it verifies | AWS ECR is reachable |
| Evidence collected | Registry URL, AWS region, endpoint status |
| Failure modes | AWS credentials invalid, region mismatch |
Remediation:
# 1. Verify AWS credentials
aws sts get-caller-identity
# 2. Test ECR describe repositories
aws ecr describe-repositories --region {REGION}
# 3. Get ECR login token
aws ecr get-login-password --region {REGION} | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com
# 4. Check AWS credentials configuration
cat ~/.aws/credentials
Verification: stella doctor --check check.integration.registry.ecr.connectivity
check.integration.registry.ecr.pull
| Property | Value |
|---|---|
| CheckId | check.integration.registry.ecr.pull |
| Plugin | stellaops.doctor.integration.registry.ecr |
| Category | Integration |
| Severity | Fail |
| Tags | integration, registry, ecr, aws |
| What it verifies | Can pull images from ECR repositories |
| Evidence collected | Repository, IAM permissions, error |
| Failure modes | ecr:GetAuthorizationToken denied, ecr:BatchGetImage denied |
Remediation:
# 1. Check IAM permissions
aws iam simulate-principal-policy \
--policy-source-arn {ROLE_ARN} \
--action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer
# 2. Add required IAM policy
aws iam put-role-policy --role-name {ROLE_NAME} --policy-name ECRPullAccess --policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}]
}'
# 3. Test pull
docker pull {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{REPO}:{TAG}
Verification: stella doctor --check check.integration.registry.ecr.pull
9.7 Observability Plugin (stellaops.doctor.observability)
check.telemetry.otlp.endpoint
| Property | Value |
|---|---|
| CheckId | check.telemetry.otlp.endpoint |
| Plugin | stellaops.doctor.observability |
| Category | Observability |
| Severity | Warn |
| Tags | observability, telemetry |
| What it verifies | OTLP collector endpoint is reachable |
| Evidence collected | Endpoint URL, response status, protocol |
| Failure modes | Collector unreachable, wrong protocol (gRPC vs HTTP) |
Remediation:
# 1. Check OTLP endpoint configuration
echo $OTEL_EXPORTER_OTLP_ENDPOINT
# 2. Test HTTP endpoint
curl -v ${OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces
# 3. Test gRPC endpoint
grpcurl -plaintext {COLLECTOR_HOST}:4317 list
# 4. Check collector is running
# If using OpenTelemetry Collector:
docker logs otel-collector
# 5. Verify collector configuration
cat /etc/otel-collector/config.yaml
Verification: stella doctor --check check.telemetry.otlp.endpoint
check.logs.directory.writable
| Property | Value |
|---|---|
| CheckId | check.logs.directory.writable |
| Plugin | stellaops.doctor.observability |
| Category | Observability |
| Severity | Fail |
| Tags | quick, observability, logs |
| What it verifies | Log directory is writable |
| Evidence collected | Log path, permissions, owner |
| Failure modes | Directory not writable, disk full |
Remediation:
# 1. Check log directory permissions
ls -la /var/log/stellaops
# 2. Fix ownership
sudo chown -R stellaops:stellaops /var/log/stellaops
# 3. Fix permissions
sudo chmod 755 /var/log/stellaops
# 4. Check disk space
df -h /var/log/stellaops
Verification: stella doctor --check check.logs.directory.writable
check.logs.rotation.configured
| Property | Value |
|---|---|
| CheckId | check.logs.rotation.configured |
| Plugin | stellaops.doctor.observability |
| Category | Observability |
| Severity | Warn |
| Tags | observability, logs |
| What it verifies | Log rotation is configured |
| Evidence collected | Rotation config path, settings |
| Failure modes | No rotation configured, invalid config |
Remediation:
# 1. Check if logrotate config exists
ls -la /etc/logrotate.d/stellaops
# 2. Create logrotate configuration
sudo cat > /etc/logrotate.d/stellaops << 'EOF'
/var/log/stellaops/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 640 stellaops stellaops
postrotate
systemctl reload stellaops-gateway > /dev/null 2>&1 || true
endscript
}
EOF
# 3. Test logrotate configuration
sudo logrotate -d /etc/logrotate.d/stellaops
Verification: stella doctor --check check.logs.rotation.configured
check.metrics.prometheus.scrape
| Property | Value |
|---|---|
| CheckId | check.metrics.prometheus.scrape |
| Plugin | stellaops.doctor.observability |
| Category | Observability |
| Severity | Warn |
| Tags | observability, metrics |
| What it verifies | Prometheus metrics endpoint is accessible |
| Evidence collected | Metrics endpoint, sample metrics count |
| Failure modes | Endpoint not exposed, auth required |
Remediation:
# 1. Check metrics endpoint
curl -s http://localhost:{PORT}/metrics | head -20
# 2. Verify metrics are being scraped
curl -s http://{PROMETHEUS_HOST}:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job == "stellaops")'
# 3. Add Prometheus scrape config
# In prometheus.yml:
scrape_configs:
- job_name: 'stellaops'
static_configs:
- targets: ['stellaops-gateway:8080', 'stellaops-concelier:8081']
# 4. Reload Prometheus
curl -X POST http://{PROMETHEUS_HOST}:9090/-/reload
Verification: stella doctor --check check.metrics.prometheus.scrape
9.8 Release Orchestrator Plugin (stellaops.doctor.releaseorch)
check.releaseorch.environments.configured
| Property | Value |
|---|---|
| CheckId | check.releaseorch.environments.configured |
| Plugin | stellaops.doctor.releaseorch |
| Category | Integration |
| Severity | Fail |
| Tags | release, environments |
| What it verifies | At least one environment is configured |
| Evidence collected | Environment count, environment names |
| Failure modes | No environments configured |
Remediation:
# 1. List current environments
stella environments list
# 2. Create development environment
stella environments create \
--name development \
--type development \
--promotion-target staging
# 3. Create staging environment
stella environments create \
--name staging \
--type staging \
--promotion-target production \
--requires-approval
# 4. Create production environment
stella environments create \
--name production \
--type production \
--requires-approval
Verification: stella doctor --check check.releaseorch.environments.configured
check.releaseorch.deployments.targets
| Property | Value |
|---|---|
| CheckId | check.releaseorch.deployments.targets |
| Plugin | stellaops.doctor.releaseorch |
| Category | Integration |
| Severity | Fail |
| Tags | release, deployments |
| What it verifies | Deployment targets are reachable |
| Evidence collected | Target type, connectivity status, last heartbeat |
| Failure modes | Agent offline, target unreachable |
Remediation:
# 1. List deployment targets
stella deployments targets list
# 2. Check agent status
stella deployments targets health --target {TARGET_ID}
# 3. Restart agent if needed
# On target host:
sudo systemctl restart stellaops-agent
# 4. Re-register target if agent was reinstalled
stella deployments targets register \
--name {TARGET_NAME} \
--type docker-compose \
--endpoint ssh://user@host
Verification: stella doctor --check check.releaseorch.deployments.targets
10. Plugin Implementation Details
10.1 Core Platform Plugin
Location: src/__Libraries/StellaOps.Doctor/Plugins/Core/
Provides foundational checks for configuration, runtime, and platform health.
Checks Provided:
check.config.requiredcheck.config.syntaxcheck.config.deprecatedcheck.runtime.dotnetcheck.runtime.memorycheck.runtime.disk.spacecheck.runtime.disk.permissionscheck.time.synccheck.crypto.profiles
Dependencies: None (core plugin)
10.2 Database & Migrations Plugin
Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Database/
Provides database connectivity and migration state checks.
References:
src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cssrc/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationStatusService.cs
Checks Provided:
check.database.connectivitycheck.database.versioncheck.database.migrations.pendingcheck.database.migrations.checksumcheck.database.migrations.lockcheck.database.schema.{schema}(dynamic per schema)check.database.connections.pool
Configuration:
Doctor:
Plugins:
Database:
Enabled: true
ConnectionTimeout: 10s
Schemas:
- auth
- vuln
- scanner
- orchestrator
10.3 Service Graph Plugin
Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ServiceGraph/
Validates inter-service connectivity via Gateway and Router.
References:
src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cssrc/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs
Checks Provided:
check.services.gateway.runningcheck.services.gateway.routingcheck.services.{service}.health(dynamic per service)check.services.{service}.connectivity(dynamic per service)check.services.authority.connectivity
Configuration:
Doctor:
Plugins:
ServiceGraph:
Enabled: true
HealthEndpointTimeout: 5s
Services:
- name: concelier
port: 8081
- name: scanner
port: 8082
- name: attestor
port: 8083
10.4 Security Plugin
Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Security/
Validates authentication, authorization, TLS, and secrets management.
References:
src/Authority/StellaOps.Authority/StellaOps.Authority.Plugin.Ldap/src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/HashiCorpVaultConnector.cs
Checks Provided:
check.auth.oidc.discoverycheck.auth.oidc.jwkscheck.auth.ldap.bindcheck.auth.ldap.searchcheck.auth.ldap.groupscheck.tls.certificates.expirycheck.tls.certificates.chaincheck.secrets.vault.connectivitycheck.secrets.vault.authcheck.secrets.vault.paths
10.5 SCM Integration Plugins
GitHub Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitHub/
GitLab Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitLab/
References:
src/Integrations/__Plugins/StellaOps.Integrations.Plugin.GitHubApp/etc/scm-connectors/github.yaml
GitHub Checks:
check.integration.scm.github.connectivitycheck.integration.scm.github.authcheck.integration.scm.github.permissionscheck.integration.scm.github.ratelimit
GitLab Checks:
check.integration.scm.gitlab.connectivitycheck.integration.scm.gitlab.authcheck.integration.scm.gitlab.permissions
10.6 Registry Integration Plugins
Harbor Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.Harbor/
ECR Plugin Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.ECR/
References:
src/Integrations/__Plugins/StellaOps.Integrations.Plugin.Harbor/
Harbor Checks:
check.integration.registry.harbor.connectivitycheck.integration.registry.harbor.authcheck.integration.registry.harbor.pull
ECR Checks:
check.integration.registry.ecr.connectivitycheck.integration.registry.ecr.pull
10.7 Observability Plugin
Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Observability/
References:
devops/telemetry/otel-collector.yaml
Checks Provided:
check.telemetry.otlp.endpointcheck.logs.directory.writablecheck.logs.rotation.configuredcheck.metrics.prometheus.scrape
10.8 Release Orchestrator Plugin
Location: src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ReleaseOrch/
References:
src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/
Checks Provided:
check.releaseorch.environments.configuredcheck.releaseorch.deployments.targets
Appendix A: Complete Check ID Reference
| CheckId | Plugin | Category | Default Severity |
|---|---|---|---|
check.config.required |
core | Core | Fail |
check.config.syntax |
core | Core | Fail |
check.config.deprecated |
core | Core | Warn |
check.runtime.dotnet |
core | Core | Fail |
check.runtime.memory |
core | Core | Warn |
check.runtime.disk.space |
core | Core | Warn |
check.runtime.disk.permissions |
core | Core | Fail |
check.time.sync |
core | Core | Warn |
check.crypto.profiles |
core | Core | Fail |
check.database.connectivity |
database | Database | Fail |
check.database.version |
database | Database | Warn |
check.database.migrations.pending |
database | Database | Fail |
check.database.migrations.checksum |
database | Database | Fail |
check.database.migrations.lock |
database | Database | Warn |
check.database.schema.{schema} |
database | Database | Fail |
check.database.connections.pool |
database | Database | Warn |
check.services.gateway.running |
servicegraph | ServiceGraph | Fail |
check.services.gateway.routing |
servicegraph | ServiceGraph | Fail |
check.services.{service}.health |
servicegraph | ServiceGraph | Fail |
check.services.{service}.connectivity |
servicegraph | ServiceGraph | Fail |
check.services.authority.connectivity |
servicegraph | ServiceGraph | Fail |
check.auth.oidc.discovery |
security | Security | Fail |
check.auth.oidc.jwks |
security | Security | Fail |
check.auth.ldap.bind |
security | Security | Fail |
check.auth.ldap.search |
security | Security | Fail |
check.auth.ldap.groups |
security | Security | Warn |
check.tls.certificates.expiry |
security | Security | Warn/Fail |
check.tls.certificates.chain |
security | Security | Fail |
check.secrets.vault.connectivity |
security | Security | Fail |
check.secrets.vault.auth |
security | Security | Fail |
check.secrets.vault.paths |
security | Security | Fail |
check.integration.scm.github.connectivity |
scm.github | Integration | Fail |
check.integration.scm.github.auth |
scm.github | Integration | Fail |
check.integration.scm.github.permissions |
scm.github | Integration | Fail |
check.integration.scm.github.ratelimit |
scm.github | Integration | Warn |
check.integration.scm.gitlab.connectivity |
scm.gitlab | Integration | Fail |
check.integration.scm.gitlab.auth |
scm.gitlab | Integration | Fail |
check.integration.registry.harbor.connectivity |
registry.harbor | Integration | Fail |
check.integration.registry.harbor.auth |
registry.harbor | Integration | Fail |
check.integration.registry.harbor.pull |
registry.harbor | Integration | Fail |
check.integration.registry.ecr.connectivity |
registry.ecr | Integration | Fail |
check.integration.registry.ecr.pull |
registry.ecr | Integration | Fail |
check.telemetry.otlp.endpoint |
observability | Observability | Warn |
check.logs.directory.writable |
observability | Observability | Fail |
check.logs.rotation.configured |
observability | Observability | Warn |
check.metrics.prometheus.scrape |
observability | Observability | Warn |
check.releaseorch.environments.configured |
releaseorch | Integration | Fail |
check.releaseorch.deployments.targets |
releaseorch | Integration | Fail |
Appendix B: Quick Reference - Common Issues
Database Issues
# Connection refused
sudo systemctl start postgresql
stella doctor --check check.database.connectivity
# Pending migrations
stella system migrations-run --category release
stella doctor --check check.database.migrations.pending
# Migration lock stuck
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"
Authentication Issues
# OIDC discovery fails
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration
sudo systemctl restart stellaops-authority
# LDAP bind fails
ldapsearch -x -H ldaps://{HOST}:636 -D "{BIND_DN}" -w "{PASSWORD}" -b "" -s base
Integration Issues
# GitHub rate limit
curl -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit
# Harbor connectivity
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq
Document generated: 2026-01-12 Stella Ops Doctor Capability Specification v1.0.0-draft