3471 lines
106 KiB
Markdown
3471 lines
106 KiB
Markdown
# Stella Ops Doctor Capability Specification
|
|
|
|
> **Status:** Planning / Capability Design
|
|
> **Version:** 1.0.0-draft
|
|
> **Last Updated:** 2026-01-12
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Executive Summary](#1-executive-summary)
|
|
2. [Current State Analysis](#2-current-state-analysis)
|
|
3. [Doctor Architecture](#3-doctor-architecture)
|
|
4. [Plugin System Specification](#4-plugin-system-specification)
|
|
5. [CLI Surface](#5-cli-surface)
|
|
6. [UI Surface](#6-ui-surface)
|
|
7. [API Surface](#7-api-surface)
|
|
8. [Remediation Command Patterns](#8-remediation-command-patterns)
|
|
9. [Doctor Check Catalog](#9-doctor-check-catalog)
|
|
10. [Plugin Implementation Details](#10-plugin-implementation-details)
|
|
|
|
---
|
|
|
|
## 1. Executive Summary
|
|
|
|
### 1.1 Purpose
|
|
|
|
The Doctor capability provides comprehensive self-service diagnostics for Stella Ops deployments. It enables operators, DevOps engineers, and developers to:
|
|
|
|
- **Diagnose** what is working and what is not
|
|
- **Understand** why failures occur with collected evidence
|
|
- **Remediate** issues with copy/paste commands
|
|
- **Verify** fixes with re-runnable checks
|
|
|
|
### 1.2 Target Users
|
|
|
|
| User Type | Primary Use Case |
|
|
|-----------|------------------|
|
|
| **Operators** | Pre-deployment validation, incident triage, routine health checks |
|
|
| **DevOps Engineers** | Integration setup, migration management, environment troubleshooting |
|
|
| **Developers** | Local development environment validation, API connectivity testing |
|
|
| **Support Engineers** | Remote diagnostics, evidence collection for escalation |
|
|
|
|
### 1.3 Key Principles
|
|
|
|
1. **Plugin-First Architecture** - All checks implemented via extensible plugins
|
|
2. **Actionable Remediation** - Every failure includes copy/paste fix commands
|
|
3. **Zero Docs Familiarity** - Users can diagnose and fix without reading documentation
|
|
4. **Evidence-Based Diagnostics** - All checks collect and report evidence
|
|
5. **Multi-Surface Consistency** - Same check engine powers CLI, UI, and API
|
|
6. **Non-Destructive Fixes** - Doctor never executes destructive actions; fix commands must be safe and idempotent
|
|
|
|
### 1.4 Surfaces
|
|
|
|
| Surface | Entry Point | Primary Use |
|
|
|---------|-------------|-------------|
|
|
| **CLI** | `stella doctor` | Automation, CI/CD gates, SSH troubleshooting |
|
|
| **UI** | `/ops/doctor` | Interactive diagnosis, team collaboration |
|
|
| **API** | `POST /api/v1/doctor/run` | Programmatic integration, monitoring systems |
|
|
|
|
---
|
|
|
|
## 2. Current State Analysis
|
|
|
|
### 2.1 CLI - Current State
|
|
|
|
**Location:** `src/Cli/StellaOps.Cli/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Entry Point | `src/Cli/StellaOps.Cli/Program.cs` | Main CLI bootstrap using System.CommandLine |
|
|
| Command Factory | `src/Cli/StellaOps.Cli/Commands/CommandFactory.cs` | Registers 88+ command groups |
|
|
| Config Bootstrap | `src/Cli/StellaOps.Cli/Configuration/CliBootstrapper.cs` | Environment + YAML/JSON config loading |
|
|
| Exit Codes | `src/Cli/StellaOps.Cli/CliExitCodes.cs` | Standardized exit codes (0-99) |
|
|
| Crypto Validator | `src/Cli/StellaOps.Cli/Services/CryptoProfileValidator.cs` | Startup validation for crypto profiles |
|
|
| Migration Commands | `src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs` | `migrations-run`, `migrations-status`, `migrations-verify` |
|
|
|
|
#### Existing Validation Patterns
|
|
|
|
```csharp
|
|
// CryptoProfileValidator.cs - Startup validation pattern
|
|
public sealed record ValidationResult
|
|
{
|
|
public bool IsValid { get; init; }
|
|
public bool HasWarnings { get; init; }
|
|
public bool HasErrors { get; init; }
|
|
public List<string> Errors { get; init; }
|
|
public List<string> Warnings { get; init; }
|
|
public string ActiveProfile { get; init; }
|
|
public List<string> AvailableProviders { get; init; }
|
|
}
|
|
```
|
|
|
|
#### Gaps
|
|
|
|
- No unified `stella doctor` command
|
|
- Output formatting is ad-hoc per command (no centralized formatter)
|
|
- No remediation command generation
|
|
- Validation only for crypto profiles, not comprehensive system state
|
|
|
|
#### Proposed Capability
|
|
|
|
```bash
|
|
# Quick system health check
|
|
stella doctor
|
|
|
|
# Full diagnostic with all checks
|
|
stella doctor --full
|
|
|
|
# Check specific category
|
|
stella doctor --category database
|
|
stella doctor --category integrations
|
|
|
|
# Check specific plugin
|
|
stella doctor --plugin scm.github
|
|
|
|
# Run single check
|
|
stella doctor --check check.database.migrations.pending
|
|
|
|
# Output formats
|
|
stella doctor --format json
|
|
stella doctor --format markdown
|
|
stella doctor --format text
|
|
|
|
# Export report
|
|
stella doctor --export report.json
|
|
stella doctor --export report.md
|
|
|
|
# Filter by severity
|
|
stella doctor --severity fail,warn
|
|
```
|
|
|
|
---
|
|
|
|
### 2.2 Health Infrastructure - Current State
|
|
|
|
**Pattern:** Extensive health endpoints across 20+ services
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Health Status Enum | `src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthStatus.cs` | Unknown, Healthy, Degraded, Unhealthy |
|
|
| Health Check Result | `src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs` | Rich result with factory methods |
|
|
| Gateway Health | `src/Gateway/StellaOps.Gateway.WebService/Middleware/HealthCheckMiddleware.cs` | `/health/live`, `/health/ready`, `/health/startup` |
|
|
| Scanner Health | `src/Scanner/StellaOps.Scanner.WebService/Endpoints/HealthEndpoints.cs` | `/healthz`, `/readyz` |
|
|
| Orchestrator Health | `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/HealthEndpoints.cs` | `/health/details` |
|
|
| Platform Health | `src/Platform/__Libraries/StellaOps.Platform.Health/PlatformHealthService.cs` | Cross-service aggregation |
|
|
| Health Contract | `devops/docker/health-endpoints.md` | Formal endpoint specification |
|
|
|
|
#### Health Check Result Model
|
|
|
|
```csharp
|
|
// From src/Plugin/StellaOps.Plugin.Abstractions/Health/HealthCheckResult.cs
|
|
public sealed record HealthCheckResult(
|
|
HealthStatus Status,
|
|
string? Message,
|
|
IReadOnlyDictionary<string, string>? Details,
|
|
DateTimeOffset CheckedAt,
|
|
TimeSpan Duration)
|
|
{
|
|
public static HealthCheckResult Healthy(string? message = null) => ...
|
|
public static HealthCheckResult Degraded(string message) => ...
|
|
public static HealthCheckResult Unhealthy(string message, Exception? ex = null) => ...
|
|
}
|
|
```
|
|
|
|
#### Gaps
|
|
|
|
- Health endpoints check liveness/readiness, not comprehensive diagnostics
|
|
- No remediation guidance in health responses
|
|
- No aggregated cross-service diagnostic view
|
|
- Health checks don't verify configuration validity
|
|
|
|
---
|
|
|
|
### 2.3 Doctor Service - Current State (ReleaseOrchestrator)
|
|
|
|
**Location:** `src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Doctor Service | `Doctor/DoctorService.cs` | Runs `IDoctorCheck` implementations |
|
|
| Doctor Report | `Doctor/DoctorReport.cs` | Aggregated results with counts |
|
|
| Check Result | `Doctor/CheckResult.cs` | Individual check outcome |
|
|
| IDoctorCheck | `Doctor/IDoctorCheck.cs` | Plugin interface for checks |
|
|
|
|
#### IDoctorCheck Interface
|
|
|
|
```csharp
|
|
// Existing interface (simplified)
|
|
public interface IDoctorCheck
|
|
{
|
|
string Name { get; }
|
|
string Category { get; }
|
|
Task<CheckResult> RunAsync(CancellationToken ct);
|
|
}
|
|
|
|
public sealed record CheckResult(
|
|
string Name,
|
|
HealthStatus Status,
|
|
string? Message,
|
|
TimeSpan Duration);
|
|
|
|
public sealed record DoctorReport(
|
|
int PassCount,
|
|
int WarningCount,
|
|
int FailCount,
|
|
int SkippedCount,
|
|
HealthStatus OverallStatus,
|
|
TimeSpan TotalDuration,
|
|
IReadOnlyList<CheckResult> Results);
|
|
```
|
|
|
|
#### Gaps
|
|
|
|
- Only available in ReleaseOrchestrator, not CLI or other modules
|
|
- No remediation commands in output
|
|
- No evidence collection
|
|
- Limited to integration checks only
|
|
- No plugin discovery mechanism
|
|
|
|
---
|
|
|
|
### 2.4 Integration Plugins - Current State
|
|
|
|
**Location:** `src/Integrations/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Plugin Contract | `__Libraries/StellaOps.Integrations.Contracts/IIntegrationConnectorPlugin.cs` | Core plugin interface |
|
|
| Integration Types | `__Libraries/StellaOps.Integrations.Contracts/IntegrationType.cs` | Registry, SCM, CI/CD, etc. |
|
|
| GitHub Plugin | `__Plugins/StellaOps.Integrations.Plugin.GitHubApp/GitHubAppConnectorPlugin.cs` | GitHub App integration |
|
|
| Harbor Plugin | `__Plugins/StellaOps.Integrations.Plugin.Harbor/HarborConnectorPlugin.cs` | Harbor registry |
|
|
| Plugin Loader | `StellaOps.Integrations.WebService/IntegrationPluginLoader.cs` | Assembly-based discovery |
|
|
| Vault Connectors | `src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/` | HashiCorp Vault, Azure Key Vault |
|
|
|
|
#### IIntegrationConnectorPlugin Interface
|
|
|
|
```csharp
|
|
public interface IIntegrationConnectorPlugin : IAvailabilityPlugin
|
|
{
|
|
IntegrationType Type { get; }
|
|
IntegrationProvider Provider { get; }
|
|
string Name { get; }
|
|
|
|
Task<TestConnectionResult> TestConnectionAsync(
|
|
IntegrationConfig config,
|
|
CancellationToken ct);
|
|
|
|
Task<HealthCheckResult> CheckHealthAsync(
|
|
IntegrationConfig config,
|
|
CancellationToken ct);
|
|
}
|
|
```
|
|
|
|
#### Supported Integration Types
|
|
|
|
```csharp
|
|
public enum IntegrationType
|
|
{
|
|
Registry = 1, // Harbor, ECR, GCR, ACR, Docker Hub, Quay, Artifactory
|
|
Scm = 2, // GitHub, GitLab, Bitbucket, Gitea, Azure DevOps
|
|
CiCd = 3, // GitHub Actions, GitLab CI, Jenkins, CircleCI
|
|
RepoSource = 4, // npm, PyPI, Maven, NuGet, Crates.io
|
|
RuntimeHost = 5, // eBPF, ETW, dyld agents
|
|
FeedMirror = 6 // NVD, OSV, StellaOps mirrors
|
|
}
|
|
```
|
|
|
|
#### Gaps
|
|
|
|
- `TestConnectionAsync` exists but not surfaced via CLI doctor
|
|
- No standardized remediation output
|
|
- Health checks don't report required permissions/scopes
|
|
- No validation of webhook/event delivery configuration
|
|
|
|
---
|
|
|
|
### 2.5 Authority Plugins - Current State
|
|
|
|
**Location:** `src/Authority/StellaOps.Authority/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Plugin Abstractions | `StellaOps.Authority.Plugins.Abstractions/` | Plugin registration interface |
|
|
| LDAP Plugin | `StellaOps.Authority.Plugin.Ldap/` | LDAP/AD integration |
|
|
| OIDC Plugin | `StellaOps.Authority.Plugin.Oidc/` | OpenID Connect |
|
|
| SAML Plugin | `StellaOps.Authority.Plugin.Saml/` | SAML 2.0 |
|
|
| Plugin Registry | `StellaOps.Authority/AuthorityPluginRegistry.cs` | Manages named plugins |
|
|
| LDAP Config | `etc/authority.plugins/ldap.yaml` | Sample configuration |
|
|
|
|
#### LDAP Plugin Capabilities
|
|
|
|
```yaml
|
|
# From etc/authority.plugins/ldap.yaml
|
|
connection:
|
|
host: "ldaps://ldap.example.internal"
|
|
port: 636
|
|
searchBase: "ou=people,dc=example,dc=internal"
|
|
bindDn: "cn=bind-user,ou=service,dc=example,dc=internal"
|
|
bindPasswordSecret: "file:/etc/secrets/ldap-bind.txt"
|
|
security:
|
|
requireTls: true
|
|
claims:
|
|
groupAttribute: "memberOf"
|
|
cache:
|
|
enabled: true
|
|
ttlSeconds: 600
|
|
```
|
|
|
|
#### Gaps
|
|
|
|
- No CLI command to validate LDAP configuration
|
|
- Health checks exist but don't provide remediation
|
|
- No validation of group mapping correctness
|
|
- TLS certificate validation not exposed as diagnostic
|
|
|
|
---
|
|
|
|
### 2.6 Database & Migrations - Current State
|
|
|
|
**Location:** `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Migration Runner | `Migrations/MigrationRunner.cs` | Executes SQL migrations with advisory locks |
|
|
| Migration Category | `Migrations/MigrationCategory.cs` | Startup, Release, Seed, Data |
|
|
| Status Service | `Migrations/MigrationStatusService.cs` | Query migration state |
|
|
| CLI Commands | `src/Cli/StellaOps.Cli/Services/MigrationCommandService.cs` | `migrations-run/status/verify` |
|
|
| Strategy Docs | `docs/db/MIGRATION_STRATEGY.md` | Migration process documentation |
|
|
|
|
#### Migration Categories
|
|
|
|
| Prefix | Category | Automatic | Breaking |
|
|
|--------|----------|-----------|----------|
|
|
| `001-099` | Startup | Yes | No |
|
|
| `100-199` | Release | No (CLI) | Yes |
|
|
| `S001-S999` | Seed | Yes | No |
|
|
| `DM001-DM999` | Data | Background | Varies |
|
|
|
|
#### Schema Tracking
|
|
|
|
```sql
|
|
CREATE TABLE {schema}.schema_migrations (
|
|
migration_name TEXT PRIMARY KEY,
|
|
category TEXT NOT NULL DEFAULT 'startup',
|
|
checksum TEXT NOT NULL,
|
|
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
applied_by TEXT,
|
|
duration_ms INT
|
|
);
|
|
```
|
|
|
|
#### Gaps
|
|
|
|
- Migration status not integrated with doctor
|
|
- No checksum mismatch diagnostics with remediation
|
|
- Lock contention not diagnosed
|
|
- No cross-schema migration state view
|
|
|
|
---
|
|
|
|
### 2.7 UI - Current State
|
|
|
|
**Location:** `src/Web/StellaOps.Web/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Routes | `src/app/app.routes.ts` | Angular Router configuration |
|
|
| Platform Health | `src/app/features/platform-health/` | Health dashboard at `/ops/health` |
|
|
| Health Client | `src/app/core/api/platform-health.client.ts` | API client for health endpoints |
|
|
| Console Status | `src/app/features/console/console-status.component.ts` | Queue/run status |
|
|
|
|
#### Platform Health Dashboard Features
|
|
|
|
- Real-time KPI strip (services, latency, error rate, incidents)
|
|
- Service health grid with grouping (healthy/degraded/unhealthy)
|
|
- Dependency graph visualization
|
|
- Incident timeline (last 24h)
|
|
- Auto-refresh every 10 seconds
|
|
|
|
#### Gaps
|
|
|
|
- No diagnostic check execution from UI
|
|
- No remediation command display
|
|
- No evidence collection/export
|
|
- Health dashboard shows status, not actionable diagnostics
|
|
|
|
---
|
|
|
|
### 2.8 Service Connectivity - Current State
|
|
|
|
**Location:** `src/Gateway/`, `src/Router/`
|
|
|
|
#### What Exists Today
|
|
|
|
| Component | File Path | Description |
|
|
|-----------|-----------|-------------|
|
|
| Gateway Routing | `src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs` | HTTP to microservice routing |
|
|
| Connection Manager | `src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs` | HELLO handshake, heartbeats |
|
|
| Routing State | `src/Router/__Libraries/StellaOps.Router.Common/Abstractions/IGlobalRoutingState.cs` | Live service connections |
|
|
| Claims Propagation | `src/Gateway/StellaOps.Gateway.WebService/Middleware/ClaimsPropagationMiddleware.cs` | OAuth claims forwarding |
|
|
|
|
#### Service Registration Flow
|
|
|
|
1. Service connects to Gateway via Router transport (TCP/TLS/Valkey)
|
|
2. HELLO handshake with endpoint/schema declarations
|
|
3. Periodic heartbeats with health/latency metrics
|
|
4. Gateway maintains `ConnectionState` for routing decisions
|
|
|
|
#### Gaps
|
|
|
|
- No CLI command to verify service graph health
|
|
- Routing failures not diagnosed with remediation
|
|
- No validation of claims propagation configuration
|
|
- Transport connectivity not exposed as diagnostic
|
|
|
|
---
|
|
|
|
## 3. Doctor Architecture
|
|
|
|
### 3.1 High-Level Architecture
|
|
|
|
```
|
|
+------------------+ +------------------+ +------------------+
|
|
| CLI | | UI | | External |
|
|
| stella doctor | | /ops/doctor | | Monitoring |
|
|
+--------+---------+ +--------+---------+ +--------+---------+
|
|
| | |
|
|
v v v
|
|
+------------------------------------------------------------------------+
|
|
| Doctor API Layer |
|
|
| POST /api/v1/doctor/run GET /api/v1/doctor/checks |
|
|
| GET /api/v1/doctor/report WebSocket /api/v1/doctor/stream |
|
|
+------------------------------------------------------------------------+
|
|
|
|
|
v
|
|
+------------------------------------------------------------------------+
|
|
| Doctor Engine (Core) |
|
|
| +------------------+ +------------------+ +------------------+ |
|
|
| | Check Registry | | Check Executor | | Report Generator | |
|
|
| | - Discovery | | - Parallel exec | | - JSON/MD/Text | |
|
|
| | - Filtering | | - Timeout mgmt | | - Remediation | |
|
|
| +------------------+ +------------------+ +------------------+ |
|
|
+------------------------------------------------------------------------+
|
|
|
|
|
v
|
|
+------------------------------------------------------------------------+
|
|
| Plugin System |
|
|
+--------+---------+---------+---------+---------+---------+-------------+
|
|
| | | | | |
|
|
v v v v v v
|
|
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
|
|
| Core | | DB & | |Service| | SCM | |Regis-| | Vault| | Authority|
|
|
| Plugin | |Migra-| | Graph | |Plugin| | try | |Plugin| | Plugin |
|
|
| | | tions| |Plugin | | | |Plugin| | | | |
|
|
+--------+ +------+ +------+ +------+ +------+ +------+ +----------+
|
|
```
|
|
|
|
### 3.2 Core Components
|
|
|
|
#### Doctor Engine
|
|
|
|
**Proposed Location:** `src/__Libraries/StellaOps.Doctor/`
|
|
|
|
```
|
|
StellaOps.Doctor/
|
|
├── Engine/
|
|
│ ├── DoctorEngine.cs # Main orchestrator
|
|
│ ├── CheckExecutor.cs # Parallel check execution
|
|
│ └── CheckRegistry.cs # Plugin discovery & filtering
|
|
├── Models/
|
|
│ ├── DoctorCheckResult.cs # Extended check result with evidence
|
|
│ ├── DoctorReport.cs # Full report model
|
|
│ ├── Remediation.cs # Fix command model
|
|
│ └── Evidence.cs # Collected evidence model
|
|
├── Plugins/
|
|
│ ├── IDoctorPlugin.cs # Plugin interface
|
|
│ ├── IDoctorCheck.cs # Check interface (extended)
|
|
│ └── DoctorPluginContext.cs # Plugin execution context
|
|
├── Output/
|
|
│ ├── JsonReportFormatter.cs # JSON output
|
|
│ ├── MarkdownReportFormatter.cs # Markdown output
|
|
│ └── TextReportFormatter.cs # Console text output
|
|
└── DoctorServiceExtensions.cs # DI registration
|
|
```
|
|
|
|
#### Check Execution Model
|
|
|
|
```csharp
|
|
public sealed class CheckExecutor
|
|
{
|
|
private readonly IEnumerable<IDoctorPlugin> _plugins;
|
|
private readonly TimeProvider _timeProvider;
|
|
private readonly ILogger<CheckExecutor> _logger;
|
|
|
|
public async Task<DoctorReport> RunAsync(
|
|
DoctorRunOptions options,
|
|
CancellationToken ct)
|
|
{
|
|
var checks = GetFilteredChecks(options);
|
|
var results = new ConcurrentBag<DoctorCheckResult>();
|
|
|
|
// Parallel execution with configurable concurrency
|
|
await Parallel.ForEachAsync(
|
|
checks,
|
|
new ParallelOptions
|
|
{
|
|
MaxDegreeOfParallelism = options.Parallelism,
|
|
CancellationToken = ct
|
|
},
|
|
async (check, token) =>
|
|
{
|
|
var result = await ExecuteCheckAsync(check, options, token);
|
|
results.Add(result);
|
|
});
|
|
|
|
return GenerateReport(results, options);
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3.3 Result Model
|
|
|
|
```csharp
|
|
public sealed record DoctorCheckResult
|
|
{
|
|
// Identity
|
|
public required string CheckId { get; init; }
|
|
public required string PluginId { get; init; }
|
|
public required string Category { get; init; }
|
|
|
|
// Outcome
|
|
public required DoctorSeverity Severity { get; init; } // Pass, Warn, Fail, Skip
|
|
public required string Diagnosis { get; init; }
|
|
|
|
// Evidence
|
|
public required Evidence Evidence { get; init; }
|
|
|
|
// Remediation
|
|
public IReadOnlyList<string>? LikelyCauses { get; init; }
|
|
public Remediation? Remediation { get; init; }
|
|
public string? VerificationCommand { get; init; }
|
|
|
|
// Metadata
|
|
public required TimeSpan Duration { get; init; }
|
|
public required DateTimeOffset ExecutedAt { get; init; }
|
|
}
|
|
|
|
public enum DoctorSeverity
|
|
{
|
|
Pass = 0,
|
|
Info = 1,
|
|
Warn = 2,
|
|
Fail = 3,
|
|
Skip = 4
|
|
}
|
|
|
|
public sealed record Evidence
|
|
{
|
|
public required string Description { get; init; }
|
|
public required IReadOnlyDictionary<string, string> Data { get; init; }
|
|
public IReadOnlyList<string>? SensitiveKeys { get; init; } // Keys to redact in output
|
|
}
|
|
|
|
public sealed record Remediation
|
|
{
|
|
public required IReadOnlyList<RemediationStep> Steps { get; init; }
|
|
public string? SafetyNote { get; init; }
|
|
public bool RequiresBackup { get; init; }
|
|
}
|
|
|
|
public sealed record RemediationStep
|
|
{
|
|
public required int Order { get; init; }
|
|
public required string Description { get; init; }
|
|
public required string Command { get; init; }
|
|
public CommandType CommandType { get; init; } // Shell, SQL, API, FileEdit
|
|
public IReadOnlyDictionary<string, string>? Placeholders { get; init; }
|
|
}
|
|
|
|
public enum CommandType
|
|
{
|
|
Shell, // Bash/PowerShell command
|
|
SQL, // SQL statement
|
|
API, // API call (curl/stella CLI)
|
|
FileEdit, // File modification
|
|
Manual // Manual step (no command)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Plugin System Specification
|
|
|
|
### 4.1 Plugin Interface
|
|
|
|
```csharp
|
|
/// <summary>
|
|
/// Base interface for Doctor plugins.
|
|
/// Plugins group related checks and share configuration context.
|
|
/// </summary>
|
|
public interface IDoctorPlugin
|
|
{
|
|
/// <summary>Unique plugin identifier (e.g., "stellaops.doctor.database")</summary>
|
|
string PluginId { get; }
|
|
|
|
/// <summary>Human-readable name</summary>
|
|
string DisplayName { get; }
|
|
|
|
/// <summary>Plugin category for filtering</summary>
|
|
DoctorCategory Category { get; }
|
|
|
|
/// <summary>Plugin version for compatibility</summary>
|
|
Version Version { get; }
|
|
|
|
/// <summary>Minimum Doctor engine version required</summary>
|
|
Version MinEngineVersion { get; }
|
|
|
|
/// <summary>Check if plugin is available in current environment</summary>
|
|
bool IsAvailable(IServiceProvider services);
|
|
|
|
/// <summary>Get all checks provided by this plugin</summary>
|
|
IReadOnlyList<IDoctorCheck> GetChecks(DoctorPluginContext context);
|
|
|
|
/// <summary>Initialize plugin with configuration</summary>
|
|
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
|
|
}
|
|
|
|
public enum DoctorCategory
|
|
{
|
|
Core, // Platform, config, runtime
|
|
Database, // Schema, migrations, connectivity
|
|
ServiceGraph, // Inter-service communication
|
|
Integration, // External system integrations
|
|
Security, // Auth, TLS, secrets
|
|
Observability // Logs, metrics, traces
|
|
}
|
|
```
|
|
|
|
### 4.2 Check Interface
|
|
|
|
```csharp
|
|
/// <summary>
|
|
/// Individual diagnostic check.
|
|
/// </summary>
|
|
public interface IDoctorCheck
|
|
{
|
|
/// <summary>Unique check identifier (e.g., "check.database.migrations.pending")</summary>
|
|
string CheckId { get; }
|
|
|
|
/// <summary>Human-readable name</summary>
|
|
string Name { get; }
|
|
|
|
/// <summary>What this check verifies</summary>
|
|
string Description { get; }
|
|
|
|
/// <summary>Default severity if check fails</summary>
|
|
DoctorSeverity DefaultSeverity { get; }
|
|
|
|
/// <summary>Tags for filtering (e.g., ["quick", "security", "migration"])</summary>
|
|
IReadOnlyList<string> Tags { get; }
|
|
|
|
/// <summary>Estimated execution time</summary>
|
|
TimeSpan EstimatedDuration { get; }
|
|
|
|
/// <summary>Check if this check can run in current context</summary>
|
|
bool CanRun(DoctorPluginContext context);
|
|
|
|
/// <summary>Execute the check</summary>
|
|
Task<DoctorCheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
|
|
}
|
|
```
|
|
|
|
### 4.3 Plugin Context
|
|
|
|
```csharp
|
|
public sealed class DoctorPluginContext
|
|
{
|
|
public required IServiceProvider Services { get; init; }
|
|
public required IConfiguration Configuration { get; init; }
|
|
public required TimeProvider TimeProvider { get; init; }
|
|
public required ILogger Logger { get; init; }
|
|
|
|
// Runtime info
|
|
public required string EnvironmentName { get; init; } // Development, Staging, Production
|
|
public required string? TenantId { get; init; }
|
|
|
|
// Plugin configuration
|
|
public required JsonElement PluginConfig { get; init; }
|
|
|
|
// Evidence helpers
|
|
public EvidenceBuilder CreateEvidence() => new();
|
|
public RemediationBuilder CreateRemediation() => new();
|
|
|
|
// Secret redaction
|
|
public string Redact(string value) => "***REDACTED***";
|
|
public string RedactConnectionString(string cs) => /* redact password */;
|
|
}
|
|
```
|
|
|
|
### 4.4 Plugin Discovery
|
|
|
|
#### Static Discovery (Build-time)
|
|
|
|
Plugins register via DI at startup:
|
|
|
|
```csharp
|
|
// In Program.cs or startup
|
|
services.AddDoctorPlugin<CoreDoctorPlugin>();
|
|
services.AddDoctorPlugin<DatabaseDoctorPlugin>();
|
|
services.AddDoctorPlugin<ServiceGraphDoctorPlugin>();
|
|
services.AddDoctorPlugin<ScmGitHubDoctorPlugin>();
|
|
// ...
|
|
```
|
|
|
|
#### Dynamic Discovery (Runtime)
|
|
|
|
Plugins can be loaded from assemblies:
|
|
|
|
```csharp
|
|
// In DoctorPluginLoader.cs
|
|
public class DoctorPluginLoader
|
|
{
|
|
public IEnumerable<IDoctorPlugin> LoadFromDirectory(string path)
|
|
{
|
|
foreach (var dll in Directory.GetFiles(path, "StellaOps.Doctor.Plugin.*.dll"))
|
|
{
|
|
var assembly = Assembly.LoadFrom(dll);
|
|
foreach (var type in assembly.GetTypes()
|
|
.Where(t => typeof(IDoctorPlugin).IsAssignableFrom(t) && !t.IsAbstract))
|
|
{
|
|
yield return (IDoctorPlugin)Activator.CreateInstance(type)!;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4.5 Declarative Doctor Packs (YAML)
|
|
|
|
Doctor packs provide declarative checks that wrap CLI commands and parsing rules.
|
|
They complement compiled plugins and are loaded from `plugins/doctor/*.yaml` (plus optional override directories).
|
|
|
|
Short example:
|
|
```yaml
|
|
apiVersion: stella.ops/doctor.v1
|
|
kind: DoctorPlugin
|
|
metadata:
|
|
name: doctor-release-orchestrator-gitlab
|
|
spec:
|
|
discovery:
|
|
when:
|
|
- env: GITLAB_URL
|
|
```
|
|
|
|
Full sample: `docs/benchmarks/doctor/doctor-plugin-release-orchestrator-gitlab.yaml`
|
|
|
|
Key fields:
|
|
- `spec.discovery.when`: env/file existence gates.
|
|
- `checks[].run.exec`: command to execute (must be deterministic).
|
|
- `checks[].parse.expect` or `checks[].parse.expectJson`: pass/fail rules.
|
|
- `checks[].how_to_fix.commands[]`: exact fix commands printed verbatim.
|
|
|
|
### 4.6 Plugin Directory Structure
|
|
|
|
```
|
|
src/
|
|
├── __Libraries/
|
|
│ └── StellaOps.Doctor/ # Core doctor engine
|
|
│ └── Plugins/
|
|
│ └── Core/ # Built-in core plugin
|
|
├── Doctor/
|
|
│ └── __Plugins/
|
|
│ ├── StellaOps.Doctor.Plugin.Database/
|
|
│ ├── StellaOps.Doctor.Plugin.ServiceGraph/
|
|
│ ├── StellaOps.Doctor.Plugin.Scm.GitHub/
|
|
│ ├── StellaOps.Doctor.Plugin.Scm.GitLab/
|
|
│ ├── StellaOps.Doctor.Plugin.Registry.Harbor/
|
|
│ ├── StellaOps.Doctor.Plugin.Registry.ECR/
|
|
│ ├── StellaOps.Doctor.Plugin.Vault/
|
|
│ ├── StellaOps.Doctor.Plugin.Authority/
|
|
│ └── StellaOps.Doctor.Plugin.Observability/
|
|
```
|
|
|
|
### 4.7 Plugin Configuration
|
|
|
|
Plugins read configuration from the standard config hierarchy:
|
|
|
|
```yaml
|
|
# In stellaops.yaml or environment-specific config
|
|
Doctor:
|
|
Enabled: true
|
|
DefaultTimeout: 30s
|
|
Parallelism: 4
|
|
|
|
Plugins:
|
|
Database:
|
|
Enabled: true
|
|
ConnectionTimeout: 10s
|
|
|
|
ServiceGraph:
|
|
Enabled: true
|
|
HealthEndpointTimeout: 5s
|
|
|
|
Scm:
|
|
GitHub:
|
|
Enabled: true
|
|
RateLimitThreshold: 100
|
|
|
|
Registry:
|
|
Harbor:
|
|
Enabled: true
|
|
SkipTlsVerify: false
|
|
|
|
Vault:
|
|
Enabled: true
|
|
SecretsToValidate:
|
|
- "secret/data/stellaops/api-keys"
|
|
- "secret/data/stellaops/certificates"
|
|
```
|
|
|
|
### 4.8 Security Model
|
|
|
|
#### Secret Redaction
|
|
|
|
All evidence output is sanitized:
|
|
|
|
```csharp
|
|
public sealed class EvidenceBuilder
|
|
{
|
|
private readonly Dictionary<string, string> _data = new();
|
|
private readonly List<string> _sensitiveKeys = new();
|
|
|
|
public EvidenceBuilder Add(string key, string value)
|
|
{
|
|
_data[key] = value;
|
|
return this;
|
|
}
|
|
|
|
public EvidenceBuilder AddSensitive(string key, string value)
|
|
{
|
|
_data[key] = value;
|
|
_sensitiveKeys.Add(key);
|
|
return this;
|
|
}
|
|
|
|
public EvidenceBuilder AddConnectionString(string key, string connectionString)
|
|
{
|
|
// Parse and redact password
|
|
var redacted = RedactConnectionStringPassword(connectionString);
|
|
_data[key] = redacted;
|
|
return this;
|
|
}
|
|
}
|
|
```
|
|
|
|
#### RBAC Permissions
|
|
|
|
Doctor checks require specific scopes:
|
|
|
|
| Scope | Description |
|
|
|-------|-------------|
|
|
| `doctor:run` | Execute doctor checks |
|
|
| `doctor:run:full` | Execute all checks including sensitive |
|
|
| `doctor:export` | Export diagnostic reports |
|
|
| `admin:system` | Access system-level checks |
|
|
|
|
### 4.9 Versioning Strategy
|
|
|
|
- **Engine version:** Semantic versioning (e.g., `1.0.0`)
|
|
- **Plugin version:** Independent semantic versioning
|
|
- **Compatibility:** Plugins declare `MinEngineVersion`
|
|
- **Check IDs:** Stable across versions (never renamed)
|
|
|
|
```csharp
|
|
// Version compatibility check
|
|
if (plugin.MinEngineVersion > DoctorEngine.Version)
|
|
{
|
|
_logger.LogWarning(
|
|
"Plugin {PluginId} requires engine {Required}, current is {Current}. Skipping.",
|
|
plugin.PluginId, plugin.MinEngineVersion, DoctorEngine.Version);
|
|
continue;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. CLI Surface
|
|
|
|
### 5.1 Command Structure
|
|
|
|
**Proposed Location:** `src/Cli/StellaOps.Cli/Commands/DoctorCommandGroup.cs`
|
|
|
|
```bash
|
|
stella doctor run [options]
|
|
stella doctor list [options]
|
|
stella doctor fix --from report.json [--apply]
|
|
```
|
|
|
|
Note: `stella doctor` remains shorthand for `stella doctor run` for compatibility.
|
|
|
|
`stella doctor fix` executes only non-destructive commands. Any destructive step
|
|
must be presented as manual guidance and is not eligible for `--apply`.
|
|
|
|
### 5.2 Options and Flags
|
|
|
|
| Option | Short | Type | Default | Description |
|
|
|--------|-------|------|---------|-------------|
|
|
| `--format` | `-f` | enum | `text` | Output format: `text`, `table`, `json`, `markdown` |
|
|
| `--quick` | `-q` | flag | false | Run only quick checks (tagged `quick`) |
|
|
| `--full` | | flag | false | Run all checks including slow/intensive |
|
|
| `--pack` | | string[] | all | Filter by pack name (manifest grouping) |
|
|
| `--category` | `-c` | string[] | all | Filter by category: `core`, `database`, `service-graph`, `integration`, `security`, `observability` |
|
|
| `--plugin` | `-p` | string[] | all | Filter by plugin ID (e.g., `scm.github`) |
|
|
| `--check` | | string | | Run single check by ID |
|
|
| `--severity` | `-s` | enum[] | all | Filter output by severity: `pass`, `info`, `warn`, `fail` |
|
|
| `--export` | `-e` | path | | Export report to file |
|
|
| `--timeout` | `-t` | duration | 30s | Per-check timeout |
|
|
| `--parallel` | | int | 4 | Max parallel check execution |
|
|
| `--no-remediation` | | flag | false | Skip remediation command generation |
|
|
| `--verbose` | `-v` | flag | false | Include detailed evidence in output |
|
|
| `--tenant` | | string | | Tenant context for multi-tenant checks |
|
|
|
|
#### Fix Options
|
|
|
|
| Option | Type | Default | Description |
|
|
|--------|------|---------|-------------|
|
|
| `--from` | path | required | Path to JSON report with `how_to_fix` commands |
|
|
| `--apply` | flag | false | Execute fixes (default is dry-run preview) |
|
|
|
|
Only commands marked safe and non-destructive are eligible for `--apply`.
|
|
Destructive changes must be printed as manual steps and executed by the operator outside Doctor.
|
|
|
|
### 5.3 Exit Codes
|
|
|
|
| Code | Meaning |
|
|
|------|---------|
|
|
| 0 | All checks passed |
|
|
| 1 | One or more warnings |
|
|
| 2 | One or more failures |
|
|
| 3 | Doctor engine error |
|
|
| 4 | Invalid arguments |
|
|
| 5 | Timeout exceeded |
|
|
|
|
### 5.4 Usage Examples
|
|
|
|
```bash
|
|
# Quick health check (alias)
|
|
stella doctor
|
|
|
|
# Run all checks explicitly
|
|
stella doctor run
|
|
|
|
# Full diagnostic
|
|
stella doctor --full
|
|
|
|
# Check only database category
|
|
stella doctor --category database
|
|
|
|
# Check specific integration
|
|
stella doctor --plugin scm.github
|
|
|
|
# Run single check
|
|
stella doctor --check check.database.migrations.pending
|
|
|
|
# JSON output for CI/CD
|
|
stella doctor --format json --severity fail,warn
|
|
|
|
# Run orchestrator pack with table output
|
|
stella doctor run --pack orchestrator --format table
|
|
|
|
# Apply fixes from a JSON report (dry-run unless --apply)
|
|
stella doctor fix --from out.json --apply
|
|
|
|
# Export markdown report
|
|
stella doctor --full --format markdown --export doctor-report.md
|
|
|
|
# Verbose with all evidence
|
|
stella doctor --verbose --full
|
|
|
|
# Quick check with 60s timeout
|
|
stella doctor --quick --timeout 60s
|
|
```
|
|
|
|
### 5.5 Text Output Format
|
|
|
|
```
|
|
Stella Ops Doctor
|
|
=================
|
|
|
|
Running 47 checks across 8 plugins...
|
|
|
|
[PASS] check.config.required
|
|
All required configuration values are present
|
|
|
|
[PASS] check.database.connectivity
|
|
PostgreSQL connection successful (latency: 12ms)
|
|
|
|
[WARN] check.tls.certificates.expiry
|
|
Diagnosis: TLS certificate expires in 14 days
|
|
|
|
Evidence:
|
|
Certificate: /etc/ssl/certs/stellaops.crt
|
|
Subject: CN=stellaops.example.com
|
|
Expires: 2026-01-26T00:00:00Z
|
|
Days remaining: 14
|
|
|
|
Likely Causes:
|
|
1. Certificate renewal not scheduled
|
|
2. ACME/Let's Encrypt automation not configured
|
|
|
|
Fix Steps:
|
|
# 1. Check current certificate
|
|
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -dates
|
|
|
|
# 2. Renew certificate (if using certbot)
|
|
sudo certbot renew --cert-name stellaops.example.com
|
|
|
|
# 3. Restart services to pick up new certificate
|
|
sudo systemctl restart stellaops-gateway
|
|
|
|
Verification:
|
|
stella doctor --check check.tls.certificates.expiry
|
|
|
|
[FAIL] check.database.migrations.pending
|
|
Diagnosis: 3 pending release migrations detected in schema 'auth'
|
|
|
|
Evidence:
|
|
Schema: auth
|
|
Current version: 099_add_dpop_thumbprints
|
|
Pending migrations:
|
|
- 100_add_tenant_quotas
|
|
- 101_add_audit_retention
|
|
- 102_add_session_revocation
|
|
Connection: postgres://localhost:5432/stellaops (user: stella_app)
|
|
|
|
Likely Causes:
|
|
1. Release migrations not applied before deployment
|
|
2. Migration files added after last deployment
|
|
|
|
Fix Steps:
|
|
# 1. Backup database first (RECOMMENDED)
|
|
pg_dump -h localhost -U stella_admin -d stellaops -F c \
|
|
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
|
|
|
|
# 2. Apply pending release migrations
|
|
stella system migrations-run --module Authority --category release
|
|
|
|
# 3. Verify migrations applied
|
|
stella system migrations-status --module Authority
|
|
|
|
Verification:
|
|
stella doctor --check check.database.migrations.pending
|
|
|
|
────────────────────────────────────────────────────────────────
|
|
Summary: 44 passed, 2 warnings, 1 failed (47 total)
|
|
Duration: 8.3s
|
|
────────────────────────────────────────────────────────────────
|
|
```
|
|
|
|
---
|
|
|
|
## 6. UI Surface
|
|
|
|
### 6.1 Route and Location
|
|
|
|
**Route:** `/ops/doctor`
|
|
**Location:** `src/Web/StellaOps.Web/src/app/features/doctor/`
|
|
|
|
### 6.2 Component Structure
|
|
|
|
```
|
|
src/app/features/doctor/
|
|
├── doctor.routes.ts
|
|
├── doctor-dashboard.component.ts # Main page
|
|
├── doctor-dashboard.component.html
|
|
├── doctor-dashboard.component.scss
|
|
├── components/
|
|
│ ├── check-list/
|
|
│ │ ├── check-list.component.ts # Filterable check list
|
|
│ │ └── check-list.component.html
|
|
│ ├── check-result/
|
|
│ │ ├── check-result.component.ts # Single check display
|
|
│ │ └── check-result.component.html
|
|
│ ├── remediation-panel/
|
|
│ │ ├── remediation-panel.component.ts # Fix commands display
|
|
│ │ └── remediation-panel.component.html
|
|
│ ├── evidence-viewer/
|
|
│ │ ├── evidence-viewer.component.ts # Collected evidence
|
|
│ │ └── evidence-viewer.component.html
|
|
│ └── export-dialog/
|
|
│ ├── export-dialog.component.ts # Export options
|
|
│ └── export-dialog.component.html
|
|
└── services/
|
|
├── doctor.client.ts # API client
|
|
├── doctor.service.ts # Business logic
|
|
└── doctor.store.ts # Signal-based state
|
|
```
|
|
|
|
### 6.3 Dashboard Layout
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| Doctor Diagnostics [Run Quick] [Run Full] |
|
|
+------------------------------------------------------------------+
|
|
| Filters: [Category v] [Plugin v] [Severity v] [Export Report] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| Summary Strip |
|
|
| +----------+ +----------+ +----------+ +----------+ +----------+ |
|
|
| | 44 | | 2 | | 1 | | 0 | | 8.3s | |
|
|
| | Passed | | Warnings | | Failed | | Skipped | | Duration | |
|
|
| +----------+ +----------+ +----------+ +----------+ +----------+ |
|
|
| |
|
|
+------------------------------------------------------------------+
|
|
| Check Results |
|
|
| +----------------------------------------------------------------+ |
|
|
| | [FAIL] check.database.migrations.pending [Expand] | |
|
|
| | 3 pending release migrations in schema 'auth' | |
|
|
| +----------------------------------------------------------------+ |
|
|
| | [WARN] check.tls.certificates.expiry [Expand] | |
|
|
| | TLS certificate expires in 14 days | |
|
|
| +----------------------------------------------------------------+ |
|
|
| | [PASS] check.database.connectivity [Expand] | |
|
|
| | PostgreSQL connection successful (12ms) | |
|
|
| +----------------------------------------------------------------+ |
|
|
| | ... more checks ... | |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
### 6.4 Expanded Check View
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| [FAIL] check.database.migrations.pending |
|
|
+------------------------------------------------------------------+
|
|
| Diagnosis |
|
|
| 3 pending release migrations detected in schema 'auth' |
|
|
+------------------------------------------------------------------+
|
|
| Evidence |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Schema | auth | |
|
|
| | Current version | 099_add_dpop_thumbprints | |
|
|
| | Pending | 100_add_tenant_quotas | |
|
|
| | | 101_add_audit_retention | |
|
|
| | | 102_add_session_revocation | |
|
|
| | Connection | postgres://localhost:5432/stellaops | |
|
|
| +--------------------------------------------------------------+ |
|
|
+------------------------------------------------------------------+
|
|
| Likely Causes |
|
|
| 1. Release migrations not applied before deployment |
|
|
| 2. Migration files added after last deployment |
|
|
+------------------------------------------------------------------+
|
|
| Fix Steps [Copy All] |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Step 1: Backup database first (RECOMMENDED) [Copy] | |
|
|
| | pg_dump -h localhost -U stella_admin -d stellaops -F c \ | |
|
|
| | -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Step 2: Apply pending release migrations [Copy] | |
|
|
| | stella system migrations-run --module Authority \ | |
|
|
| | --category release | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Step 3: Verify migrations applied [Copy] | |
|
|
| | stella system migrations-status --module Authority | |
|
|
| +--------------------------------------------------------------+ |
|
|
+------------------------------------------------------------------+
|
|
| Verification [Copy] |
|
|
| stella doctor --check check.database.migrations.pending |
|
|
+------------------------------------------------------------------+
|
|
| [Re-run Check] [Mark Resolved] |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
### 6.5 Pack Navigation and Fix Actions
|
|
|
|
- Navigation hierarchy: packs -> plugins -> checks.
|
|
- Each check shows status, evidence, Copy Fix Commands, and Run Fix (disabled unless `doctor.fix.enabled=true`).
|
|
- Export actions: Download JSON and Download DSSE summary.
|
|
|
|
### 6.6 Real-Time Updates
|
|
|
|
- **Polling:** Auto-refresh option (every 30s/60s/5m)
|
|
- **SSE:** Live check progress during execution
|
|
- **WebSocket:** Optional for high-frequency updates
|
|
|
|
---
|
|
|
|
## 7. API Surface
|
|
|
|
### 7.1 Endpoints
|
|
|
|
**Base Path:** `/api/v1/doctor`
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| `GET` | `/checks` | List available checks with metadata |
|
|
| `GET` | `/plugins` | List available plugins |
|
|
| `POST` | `/run` | Execute doctor checks |
|
|
| `GET` | `/run/{runId}` | Get run status/results |
|
|
| `GET` | `/run/{runId}/stream` | SSE stream for live progress |
|
|
| `GET` | `/reports` | List historical reports |
|
|
| `GET` | `/reports/{reportId}` | Get specific report |
|
|
| `DELETE` | `/reports/{reportId}` | Delete report |
|
|
|
|
### 7.2 Request/Response Models
|
|
|
|
#### List Checks
|
|
|
|
```http
|
|
GET /api/v1/doctor/checks?category=database&tags=quick
|
|
```
|
|
|
|
```json
|
|
{
|
|
"checks": [
|
|
{
|
|
"checkId": "check.database.connectivity",
|
|
"name": "Database Connectivity",
|
|
"description": "Verify PostgreSQL connection",
|
|
"pluginId": "stellaops.doctor.database",
|
|
"category": "database",
|
|
"defaultSeverity": "fail",
|
|
"tags": ["quick", "database"],
|
|
"estimatedDurationMs": 500
|
|
}
|
|
],
|
|
"total": 47
|
|
}
|
|
```
|
|
|
|
#### Run Checks
|
|
|
|
```http
|
|
POST /api/v1/doctor/run
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"mode": "quick",
|
|
"categories": ["database", "integration"],
|
|
"plugins": [],
|
|
"checkIds": [],
|
|
"timeoutMs": 30000,
|
|
"parallelism": 4,
|
|
"includeRemediation": true
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"runId": "dr_20260112_143052_abc123",
|
|
"status": "running",
|
|
"startedAt": "2026-01-12T14:30:52Z",
|
|
"checksTotal": 12,
|
|
"checksCompleted": 0
|
|
}
|
|
```
|
|
|
|
#### Get Run Results
|
|
|
|
```http
|
|
GET /api/v1/doctor/run/dr_20260112_143052_abc123
|
|
```
|
|
|
|
```json
|
|
{
|
|
"runId": "dr_20260112_143052_abc123",
|
|
"status": "completed",
|
|
"startedAt": "2026-01-12T14:30:52Z",
|
|
"completedAt": "2026-01-12T14:31:00Z",
|
|
"durationMs": 8300,
|
|
"summary": {
|
|
"passed": 44,
|
|
"warnings": 2,
|
|
"failed": 1,
|
|
"skipped": 0,
|
|
"total": 47
|
|
},
|
|
"overallSeverity": "fail",
|
|
"results": [
|
|
{
|
|
"checkId": "check.database.migrations.pending",
|
|
"pluginId": "stellaops.doctor.database",
|
|
"category": "database",
|
|
"severity": "fail",
|
|
"diagnosis": "3 pending release migrations detected in schema 'auth'",
|
|
"evidence": {
|
|
"description": "Migration state for auth schema",
|
|
"data": {
|
|
"schema": "auth",
|
|
"currentVersion": "099_add_dpop_thumbprints",
|
|
"pendingMigrations": "100_add_tenant_quotas, 101_add_audit_retention, 102_add_session_revocation",
|
|
"connection": "postgres://localhost:5432/stellaops"
|
|
}
|
|
},
|
|
"likelyCauses": [
|
|
"Release migrations not applied before deployment",
|
|
"Migration files added after last deployment"
|
|
],
|
|
"remediation": {
|
|
"requiresBackup": true,
|
|
"safetyNote": "Always backup before running migrations",
|
|
"steps": [
|
|
{
|
|
"order": 1,
|
|
"description": "Backup database first (RECOMMENDED)",
|
|
"command": "pg_dump -h localhost -U stella_admin -d stellaops -F c -f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump",
|
|
"commandType": "shell",
|
|
"placeholders": {}
|
|
},
|
|
{
|
|
"order": 2,
|
|
"description": "Apply pending release migrations",
|
|
"command": "stella system migrations-run --module Authority --category release",
|
|
"commandType": "shell",
|
|
"placeholders": {}
|
|
},
|
|
{
|
|
"order": 3,
|
|
"description": "Verify migrations applied",
|
|
"command": "stella system migrations-status --module Authority",
|
|
"commandType": "shell",
|
|
"placeholders": {}
|
|
}
|
|
]
|
|
},
|
|
"verificationCommand": "stella doctor --check check.database.migrations.pending",
|
|
"durationMs": 234,
|
|
"executedAt": "2026-01-12T14:30:54Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Results also expose a `how_to_fix` object for automation. It is a simplified alias of
|
|
the richer `remediation` model and includes `commands[]` printed verbatim.
|
|
|
|
### 7.3 SSE Stream
|
|
|
|
```http
|
|
GET /api/v1/doctor/run/dr_20260112_143052_abc123/stream
|
|
Accept: text/event-stream
|
|
```
|
|
|
|
```
|
|
event: check-started
|
|
data: {"checkId":"check.database.connectivity","startedAt":"2026-01-12T14:30:52Z"}
|
|
|
|
event: check-completed
|
|
data: {"checkId":"check.database.connectivity","severity":"pass","durationMs":45}
|
|
|
|
event: check-started
|
|
data: {"checkId":"check.database.migrations.pending","startedAt":"2026-01-12T14:30:52Z"}
|
|
|
|
event: check-completed
|
|
data: {"checkId":"check.database.migrations.pending","severity":"fail","durationMs":234}
|
|
|
|
event: run-completed
|
|
data: {"runId":"dr_20260112_143052_abc123","summary":{"passed":44,"warnings":2,"failed":1}}
|
|
```
|
|
|
|
### 7.4 Evidence Logs and Attestations
|
|
|
|
Doctor runs emit a JSONL evidence log and optional DSSE summary for audit trails.
|
|
By default, JSONL is local only and deterministic; outbound telemetry is opt-in.
|
|
|
|
- JSONL path: `artifacts/doctor/doctor-run-<runId>.ndjson` (configurable).
|
|
- DSSE summary: `artifacts/doctor/doctor-run-<runId>.dsse.json` (optional).
|
|
- Evidence records include `doctor_command` to capture the operator-invoked command.
|
|
DSSE summaries assume operator execution and must include the same command note.
|
|
|
|
Example JSONL line:
|
|
```json
|
|
{"runId":"dr_20260112_143052_abc123","doctor_command":"stella doctor run --format json","checkId":"check.database.connectivity","severity":"pass","executedAt":"2026-01-12T14:30:52Z","how_to_fix":{"commands":[]}}
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Remediation Command Patterns
|
|
|
|
Remediation should favor the best operator experience: short, copy/paste friendly
|
|
commands with minimal steps and clear verification guidance.
|
|
|
|
### 8.1 Standard Output Format
|
|
|
|
Every failed check produces remediation in this structure:
|
|
|
|
```
|
|
[{SEVERITY}] {check.id}
|
|
Diagnosis: {one-line summary}
|
|
|
|
Evidence:
|
|
{key}: {value}
|
|
{key}: {value}
|
|
...
|
|
|
|
Likely Causes:
|
|
1. {most likely cause}
|
|
2. {second most likely cause}
|
|
...
|
|
|
|
Fix Steps:
|
|
# {step number}. {description}
|
|
{command}
|
|
|
|
# {step number}. {description}
|
|
{command}
|
|
...
|
|
|
|
Verification:
|
|
{command to re-run this specific check}
|
|
```
|
|
|
|
### 8.1.1 JSON Remediation Structure
|
|
|
|
The JSON output MUST include a `how_to_fix` object for agent consumption. It can be
|
|
derived from `remediation.steps` and preserves command order.
|
|
|
|
```json
|
|
"how_to_fix": {
|
|
"summary": "Apply baseline branch policy",
|
|
"commands": [
|
|
"stella orchestrator scm apply-branch-policy --preset strict"
|
|
]
|
|
}
|
|
```
|
|
|
|
### 8.2 Placeholder Conventions
|
|
|
|
When commands require user-specific values:
|
|
|
|
| Placeholder | Meaning | Example |
|
|
|-------------|---------|---------|
|
|
| `{HOSTNAME}` | Target hostname | `ldap.example.com` |
|
|
| `{PORT}` | Port number | `636` |
|
|
| `{USERNAME}` | Username | `admin` |
|
|
| `{PASSWORD}` | Password (never shown) | `***` |
|
|
| `{DATABASE}` | Database name | `stellaops` |
|
|
| `{SCHEMA}` | Schema name | `auth` |
|
|
| `{FILE_PATH}` | File path | `/etc/ssl/certs/ca.crt` |
|
|
| `{TOKEN}` | API token (never shown) | `***` |
|
|
| `{URL}` | Full URL | `https://api.github.com` |
|
|
|
|
### 8.3 Safety Notes
|
|
|
|
Doctor fix executes only non-destructive commands. If a fix requires a change
|
|
that modifies data, Doctor must present it as manual guidance with explicit
|
|
safety notes and never execute it.
|
|
|
|
```
|
|
Manual Steps (not executed by Doctor):
|
|
# SAFETY: This operation modifies the database. Create a backup first.
|
|
|
|
# 1. Backup database (REQUIRED before proceeding)
|
|
pg_dump -h {HOSTNAME} -U {USERNAME} -d {DATABASE} -F c \
|
|
-f backup_$(date +%Y%m%d_%H%M%S).dump
|
|
|
|
# 2. Apply the fix
|
|
stella system migrations-run --module Authority --category release
|
|
```
|
|
|
|
### 8.4 Multi-Platform Commands
|
|
|
|
Where applicable, provide commands for different platforms:
|
|
|
|
```
|
|
Fix Steps:
|
|
# 1. Restart the service
|
|
|
|
# Linux (systemd):
|
|
sudo systemctl restart stellaops-gateway
|
|
|
|
# Linux (Docker):
|
|
docker restart stellaops-gateway
|
|
|
|
# Docker Compose:
|
|
docker compose restart gateway
|
|
|
|
# Kubernetes:
|
|
kubectl rollout restart deployment/stellaops-gateway -n stellaops
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Doctor Check Catalog
|
|
|
|
This section documents all diagnostic checks organized by plugin/category.
|
|
|
|
### 9.1 Core Platform Plugin (`stellaops.doctor.core`)
|
|
|
|
#### check.config.required
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.config.required` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `config`, `startup` |
|
|
| **What it verifies** | All required configuration values are present |
|
|
| **Evidence collected** | Missing keys, config sources checked, environment |
|
|
| **Failure modes** | Missing `STELLAOPS_BACKEND_URL`, missing database connection string, missing Authority URL |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check which configuration values are missing
|
|
stella config list --show-missing
|
|
|
|
# 2. Set missing environment variables
|
|
export STELLAOPS_BACKEND_URL="https://api.stellaops.example.com"
|
|
export STELLAOPS_POSTGRES_CONNECTION="Host=localhost;Database=stellaops;Username=stella_app;Password={PASSWORD}"
|
|
export STELLAOPS_AUTHORITY_URL="https://auth.stellaops.example.com"
|
|
|
|
# 3. Or update configuration file
|
|
# Edit: /etc/stellaops/stellaops.yaml
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.config.required`
|
|
|
|
---
|
|
|
|
#### check.config.syntax
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.config.syntax` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `config` |
|
|
| **What it verifies** | Configuration files have valid YAML/JSON syntax |
|
|
| **Evidence collected** | File path, line number, parse error message |
|
|
| **Failure modes** | Invalid YAML indentation, JSON syntax error, encoding issues |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Validate YAML syntax
|
|
yamllint /etc/stellaops/stellaops.yaml
|
|
|
|
# 2. Check for encoding issues (should be UTF-8)
|
|
file /etc/stellaops/stellaops.yaml
|
|
|
|
# 3. Fix common YAML issues
|
|
# - Use spaces, not tabs
|
|
# - Check string quoting
|
|
# - Verify indentation (2 spaces per level)
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.config.syntax`
|
|
|
|
---
|
|
|
|
#### check.config.deprecated
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.config.deprecated` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `config` |
|
|
| **What it verifies** | No deprecated configuration keys are in use |
|
|
| **Evidence collected** | Deprecated keys found, replacement keys |
|
|
| **Failure modes** | Using old key names, removed options |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Review deprecated keys and their replacements
|
|
stella config migrate --dry-run
|
|
|
|
# 2. Update configuration file with new key names
|
|
stella config migrate --apply
|
|
|
|
# 3. Verify configuration after migration
|
|
stella config validate
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.config.deprecated`
|
|
|
|
---
|
|
|
|
#### check.runtime.dotnet
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.runtime.dotnet` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `runtime` |
|
|
| **What it verifies** | .NET runtime version meets minimum requirements |
|
|
| **Evidence collected** | Installed version, required version, runtime path |
|
|
| **Failure modes** | Outdated .NET version, missing runtime |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current .NET version
|
|
dotnet --version
|
|
|
|
# 2. Install required .NET version (Ubuntu/Debian)
|
|
wget https://dot.net/v1/dotnet-install.sh
|
|
chmod +x dotnet-install.sh
|
|
./dotnet-install.sh --channel 10.0
|
|
|
|
# 3. Verify installation
|
|
dotnet --list-runtimes
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.runtime.dotnet`
|
|
|
|
---
|
|
|
|
#### check.runtime.memory
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.runtime.memory` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `runtime`, `resources` |
|
|
| **What it verifies** | Sufficient memory available for operation |
|
|
| **Evidence collected** | Total memory, available memory, GC memory info |
|
|
| **Failure modes** | Low available memory (<1GB), high GC pressure |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current memory usage
|
|
free -h
|
|
|
|
# 2. Identify memory-heavy processes
|
|
ps aux --sort=-%mem | head -20
|
|
|
|
# 3. Adjust container memory limits if applicable
|
|
# Docker:
|
|
docker update --memory 4g stellaops-gateway
|
|
|
|
# Kubernetes:
|
|
kubectl patch deployment stellaops-gateway -p '{"spec":{"template":{"spec":{"containers":[{"name":"gateway","resources":{"limits":{"memory":"4Gi"}}}]}}}}'
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.runtime.memory`
|
|
|
|
---
|
|
|
|
#### check.runtime.disk.space
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.runtime.disk.space` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `runtime`, `resources` |
|
|
| **What it verifies** | Sufficient disk space on required paths |
|
|
| **Evidence collected** | Path, total space, available space, usage percentage |
|
|
| **Failure modes** | Data directory >90% full, log directory full |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check disk usage
|
|
df -h /var/lib/stellaops
|
|
|
|
# 2. Find large files
|
|
du -sh /var/lib/stellaops/* | sort -hr | head -20
|
|
|
|
# 3. Clean up old logs
|
|
find /var/log/stellaops -name "*.log" -mtime +30 -delete
|
|
|
|
# 4. Clean up old exports
|
|
stella export cleanup --older-than 30d
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.runtime.disk.space`
|
|
|
|
---
|
|
|
|
#### check.runtime.disk.permissions
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.runtime.disk.permissions` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `runtime`, `security` |
|
|
| **What it verifies** | Write permissions on required directories |
|
|
| **Evidence collected** | Path, expected permissions, actual permissions, owner |
|
|
| **Failure modes** | Cannot write to data directory, log directory not writable |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current permissions
|
|
ls -la /var/lib/stellaops
|
|
|
|
# 2. Fix ownership
|
|
sudo chown -R stellaops:stellaops /var/lib/stellaops
|
|
|
|
# 3. Fix permissions
|
|
sudo chmod 755 /var/lib/stellaops
|
|
sudo chmod 755 /var/log/stellaops
|
|
|
|
# 4. Verify write access
|
|
sudo -u stellaops touch /var/lib/stellaops/.write-test && rm /var/lib/stellaops/.write-test
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.runtime.disk.permissions`
|
|
|
|
---
|
|
|
|
#### check.time.sync
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.time.sync` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `quick`, `runtime` |
|
|
| **What it verifies** | System clock is synchronized (NTP) |
|
|
| **Evidence collected** | NTP status, clock offset, sync source |
|
|
| **Failure modes** | Clock drift >5s, NTP not running, no sync source |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check NTP status
|
|
timedatectl status
|
|
|
|
# 2. Enable NTP synchronization
|
|
sudo timedatectl set-ntp true
|
|
|
|
# 3. Force immediate sync
|
|
sudo systemctl restart systemd-timesyncd
|
|
|
|
# 4. Verify sync status
|
|
timedatectl timesync-status
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.time.sync`
|
|
|
|
---
|
|
|
|
#### check.crypto.profiles
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.crypto.profiles` |
|
|
| **Plugin** | `stellaops.doctor.core` |
|
|
| **Category** | Core |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `security`, `crypto` |
|
|
| **What it verifies** | Crypto profile is valid and providers are available |
|
|
| **Evidence collected** | Active profile, available providers, missing providers |
|
|
| **Failure modes** | Invalid profile, required provider not available |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. List available crypto profiles
|
|
stella crypto profiles list
|
|
|
|
# 2. Validate current profile
|
|
stella crypto profiles validate
|
|
|
|
# 3. Switch to a different profile if needed
|
|
stella crypto profiles set --profile default
|
|
|
|
# 4. Install missing providers (if GOST required)
|
|
# See docs/crypto/gost-setup.md
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.crypto.profiles`
|
|
|
|
---
|
|
|
|
### 9.2 Database Plugin (`stellaops.doctor.database`)
|
|
|
|
#### check.database.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `database` |
|
|
| **What it verifies** | PostgreSQL connection is successful |
|
|
| **Evidence collected** | Connection string (redacted), latency, server version |
|
|
| **Failure modes** | Connection refused, authentication failed, timeout |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test connection manually
|
|
psql "host=localhost dbname=stellaops user=stella_app" -c "SELECT 1"
|
|
|
|
# 2. Check PostgreSQL is running
|
|
sudo systemctl status postgresql
|
|
|
|
# 3. Check connection settings
|
|
# Verify pg_hba.conf allows connections
|
|
sudo cat /etc/postgresql/16/main/pg_hba.conf | grep stellaops
|
|
|
|
# 4. Check firewall
|
|
sudo ufw status | grep 5432
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.connectivity`
|
|
|
|
---
|
|
|
|
#### check.database.version
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.version` |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `database` |
|
|
| **What it verifies** | PostgreSQL version meets minimum requirements (>=16) |
|
|
| **Evidence collected** | Current version, required version |
|
|
| **Failure modes** | PostgreSQL <16, unsupported version |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current version
|
|
psql -c "SELECT version();"
|
|
|
|
# 2. Upgrade PostgreSQL (Ubuntu)
|
|
sudo apt install postgresql-16
|
|
|
|
# 3. Migrate data to new version
|
|
sudo pg_upgradecluster 14 main
|
|
|
|
# 4. Remove old version
|
|
sudo apt remove postgresql-14
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.version`
|
|
|
|
---
|
|
|
|
#### check.database.migrations.pending
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.migrations.pending` |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `database`, `migrations` |
|
|
| **What it verifies** | No pending release migrations exist |
|
|
| **Evidence collected** | Schema, current version, pending migrations list |
|
|
| **Failure modes** | Release migrations not applied before deployment |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Backup database first (RECOMMENDED)
|
|
pg_dump -h localhost -U stella_admin -d stellaops -F c \
|
|
-f stellaops_backup_$(date +%Y%m%d_%H%M%S).dump
|
|
|
|
# 2. Check migration status for all modules
|
|
stella system migrations-status
|
|
|
|
# 3. Apply pending release migrations
|
|
stella system migrations-run --category release
|
|
|
|
# 4. Verify all migrations applied
|
|
stella system migrations-status --verify
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.migrations.pending`
|
|
|
|
---
|
|
|
|
#### check.database.migrations.checksum
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.migrations.checksum` |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `database`, `migrations`, `security` |
|
|
| **What it verifies** | Applied migration checksums match source files |
|
|
| **Evidence collected** | Mismatched migrations, expected vs actual checksum |
|
|
| **Failure modes** | Migration file modified after application, corruption |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# CRITICAL: Checksum mismatch indicates potential data integrity issue
|
|
|
|
# 1. Identify mismatched migrations
|
|
stella system migrations-verify --detailed
|
|
|
|
# 2. If migrations were legitimately modified (rare):
|
|
# WARNING: Only proceed if you understand the implications
|
|
stella system migrations-repair --migration {MIGRATION_NAME} --force
|
|
|
|
# 3. If data corruption suspected:
|
|
# Restore from backup and reapply migrations
|
|
pg_restore -h localhost -U stella_admin -d stellaops stellaops_backup.dump
|
|
stella system migrations-run --all
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.migrations.checksum`
|
|
|
|
---
|
|
|
|
#### check.database.migrations.lock
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.migrations.lock` |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `database`, `migrations` |
|
|
| **What it verifies** | No stale migration locks exist |
|
|
| **Evidence collected** | Lock holder, lock duration, schema |
|
|
| **Failure modes** | Abandoned lock from crashed process |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check for active locks
|
|
psql -d stellaops -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';"
|
|
|
|
# 2. Identify lock holder process
|
|
psql -d stellaops -c "SELECT pid, query, state FROM pg_stat_activity WHERE pid IN (SELECT pid FROM pg_locks WHERE locktype = 'advisory');"
|
|
|
|
# 3. If process is dead, clear the lock
|
|
# WARNING: Only if you are certain no migration is running
|
|
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"
|
|
|
|
# 4. Retry migration
|
|
stella system migrations-run --category release
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.migrations.lock`
|
|
|
|
---
|
|
|
|
#### check.database.schema.{schema}
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.schema.{schema}` (e.g., `check.database.schema.auth`) |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `database` |
|
|
| **What it verifies** | Schema exists and has expected tables |
|
|
| **Evidence collected** | Schema name, expected tables, missing tables |
|
|
| **Failure modes** | Schema not created, tables dropped |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check if schema exists
|
|
psql -d stellaops -c "SELECT schema_name FROM information_schema.schemata WHERE schema_name = '{SCHEMA}';"
|
|
|
|
# 2. If schema missing, run startup migrations
|
|
stella system migrations-run --module {MODULE} --category startup
|
|
|
|
# 3. Verify schema tables
|
|
psql -d stellaops -c "SELECT table_name FROM information_schema.tables WHERE table_schema = '{SCHEMA}';"
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.schema.{schema}`
|
|
|
|
---
|
|
|
|
#### check.database.connections.pool
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.database.connections.pool` |
|
|
| **Plugin** | `stellaops.doctor.database` |
|
|
| **Category** | Database |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `database`, `performance` |
|
|
| **What it verifies** | Connection pool is healthy, not exhausted |
|
|
| **Evidence collected** | Active connections, idle connections, max connections |
|
|
| **Failure modes** | Pool exhausted, connection leak |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current connections
|
|
psql -d stellaops -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'stellaops';"
|
|
|
|
# 2. Check max connections
|
|
psql -d stellaops -c "SHOW max_connections;"
|
|
|
|
# 3. Identify long-running queries
|
|
psql -d stellaops -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10;"
|
|
|
|
# 4. Increase max connections if needed
|
|
# Edit postgresql.conf: max_connections = 200
|
|
sudo systemctl reload postgresql
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.database.connections.pool`
|
|
|
|
---
|
|
|
|
### 9.3 Service Graph Plugin (`stellaops.doctor.servicegraph`)
|
|
|
|
#### check.services.gateway.running
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.services.gateway.running` |
|
|
| **Plugin** | `stellaops.doctor.servicegraph` |
|
|
| **Category** | ServiceGraph |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `services` |
|
|
| **What it verifies** | Gateway service is running and accepting connections |
|
|
| **Evidence collected** | Service status, PID, uptime, port binding |
|
|
| **Failure modes** | Service not running, port already in use |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check service status
|
|
sudo systemctl status stellaops-gateway
|
|
|
|
# 2. Check logs for errors
|
|
sudo journalctl -u stellaops-gateway -n 50
|
|
|
|
# 3. Check port binding
|
|
sudo ss -tlnp | grep 443
|
|
|
|
# 4. Start/restart service
|
|
sudo systemctl restart stellaops-gateway
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.services.gateway.running`
|
|
|
|
---
|
|
|
|
#### check.services.gateway.routing
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.services.gateway.routing` |
|
|
| **Plugin** | `stellaops.doctor.servicegraph` |
|
|
| **Category** | ServiceGraph |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `services`, `routing` |
|
|
| **What it verifies** | Gateway can route requests to backend services |
|
|
| **Evidence collected** | Registered services, routing table, disconnected services |
|
|
| **Failure modes** | No services registered, all services disconnected |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check registered services
|
|
curl -s http://localhost:8080/health/routing | jq
|
|
|
|
# 2. Verify backend services are running
|
|
stella services status
|
|
|
|
# 3. Check Router transport connectivity
|
|
stella services connectivity-test
|
|
|
|
# 4. Restart disconnected services
|
|
sudo systemctl restart stellaops-concelier
|
|
sudo systemctl restart stellaops-scanner
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.services.gateway.routing`
|
|
|
|
---
|
|
|
|
#### check.services.{service}.health
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.services.{service}.health` (e.g., `check.services.concelier.health`) |
|
|
| **Plugin** | `stellaops.doctor.servicegraph` |
|
|
| **Category** | ServiceGraph |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `services` |
|
|
| **What it verifies** | Service health endpoint returns healthy |
|
|
| **Evidence collected** | Health status, dependencies, latency |
|
|
| **Failure modes** | Service unhealthy, degraded dependencies |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check service health directly
|
|
curl -s http://localhost:{PORT}/healthz | jq
|
|
|
|
# 2. Check detailed health
|
|
curl -s http://localhost:{PORT}/health/details | jq
|
|
|
|
# 3. Check service logs
|
|
sudo journalctl -u stellaops-{SERVICE} -n 100
|
|
|
|
# 4. Restart service if needed
|
|
sudo systemctl restart stellaops-{SERVICE}
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.services.{service}.health`
|
|
|
|
---
|
|
|
|
#### check.services.{service}.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.services.{service}.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.servicegraph` |
|
|
| **Category** | ServiceGraph |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `services`, `routing` |
|
|
| **What it verifies** | Service is reachable from Gateway via Router |
|
|
| **Evidence collected** | Transport type, connection state, last heartbeat |
|
|
| **Failure modes** | Connection refused, heartbeat timeout |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check Router connection status
|
|
stella services connection-status --service {SERVICE}
|
|
|
|
# 2. Test network connectivity
|
|
nc -zv {SERVICE_HOST} {SERVICE_PORT}
|
|
|
|
# 3. Check firewall rules
|
|
sudo ufw status | grep {SERVICE_PORT}
|
|
|
|
# 4. Verify Router configuration in service
|
|
# Check stellaops.yaml for correct Router endpoints
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.services.{service}.connectivity`
|
|
|
|
---
|
|
|
|
#### check.services.authority.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.services.authority.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.servicegraph` |
|
|
| **Category** | ServiceGraph |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `services`, `auth` |
|
|
| **What it verifies** | Authority service is reachable |
|
|
| **Evidence collected** | Authority URL, response status, latency |
|
|
| **Failure modes** | Authority unreachable, OIDC discovery failed |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check Authority URL configuration
|
|
echo $STELLAOPS_AUTHORITY_URL
|
|
|
|
# 2. Test OIDC discovery endpoint
|
|
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq
|
|
|
|
# 3. Check Authority service status
|
|
sudo systemctl status stellaops-authority
|
|
|
|
# 4. Verify network connectivity
|
|
curl -v ${STELLAOPS_AUTHORITY_URL}/healthz
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.services.authority.connectivity`
|
|
|
|
---
|
|
|
|
### 9.4 Security Plugin (`stellaops.doctor.security`)
|
|
|
|
#### check.auth.oidc.discovery
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.auth.oidc.discovery` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `auth`, `security` |
|
|
| **What it verifies** | OIDC well-known endpoint is accessible |
|
|
| **Evidence collected** | Discovery URL, issuer, supported flows |
|
|
| **Failure modes** | Discovery endpoint unavailable, invalid response |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test discovery endpoint
|
|
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration | jq
|
|
|
|
# 2. Verify issuer matches configuration
|
|
# The issuer in the response should match STELLAOPS_AUTHORITY_URL
|
|
|
|
# 3. Check Authority service logs
|
|
sudo journalctl -u stellaops-authority -n 50
|
|
|
|
# 4. Verify TLS certificate
|
|
openssl s_client -connect auth.stellaops.example.com:443 -servername auth.stellaops.example.com
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.auth.oidc.discovery`
|
|
|
|
---
|
|
|
|
#### check.auth.oidc.jwks
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.auth.oidc.jwks` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `auth`, `security` |
|
|
| **What it verifies** | JWKS endpoint returns valid signing keys |
|
|
| **Evidence collected** | JWKS URL, key count, key algorithms |
|
|
| **Failure modes** | JWKS unavailable, no keys, unsupported algorithms |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Fetch JWKS directly
|
|
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/jwks.json | jq
|
|
|
|
# 2. Verify keys are present
|
|
# Response should contain at least one key in "keys" array
|
|
|
|
# 3. If JWKS is empty, regenerate signing keys
|
|
stella authority keys rotate
|
|
|
|
# 4. Restart Authority service
|
|
sudo systemctl restart stellaops-authority
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.auth.oidc.jwks`
|
|
|
|
---
|
|
|
|
#### check.auth.ldap.bind
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.auth.ldap.bind` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `auth`, `security`, `ldap` |
|
|
| **What it verifies** | LDAP bind credentials are valid |
|
|
| **Evidence collected** | LDAP host, bind DN (redacted), TLS status |
|
|
| **Failure modes** | Invalid credentials, connection refused, TLS failure |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test LDAP connection with ldapsearch
|
|
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
|
|
-D "cn=bind-user,ou=service,dc=example,dc=internal" \
|
|
-w "{PASSWORD}" \
|
|
-b "ou=people,dc=example,dc=internal" "(uid=*)" dn | head -10
|
|
|
|
# 2. Check TLS certificate
|
|
openssl s_client -connect {LDAP_HOST}:636 -showcerts
|
|
|
|
# 3. Verify bind DN and password in configuration
|
|
# Check etc/authority.plugins/ldap.yaml
|
|
|
|
# 4. Test with Authority's ldap-test command
|
|
stella authority ldap-test --bind-only
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.auth.ldap.bind`
|
|
|
|
---
|
|
|
|
#### check.auth.ldap.search
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.auth.ldap.search` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `auth`, `ldap` |
|
|
| **What it verifies** | LDAP search base is accessible and returns users |
|
|
| **Evidence collected** | Search base, user count, search time |
|
|
| **Failure modes** | Search base not found, no users returned, timeout |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test LDAP search
|
|
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
|
|
-D "{BIND_DN}" -w "{PASSWORD}" \
|
|
-b "{SEARCH_BASE}" "(objectClass=person)" dn | wc -l
|
|
|
|
# 2. Verify search base in configuration
|
|
# Check etc/authority.plugins/ldap.yaml: connection.searchBase
|
|
|
|
# 3. Check if search base exists
|
|
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
|
|
-D "{BIND_DN}" -w "{PASSWORD}" \
|
|
-b "" -s base "(objectClass=*)"
|
|
|
|
# 4. Verify bind user has read permissions
|
|
# Check LDAP ACLs
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.auth.ldap.search`
|
|
|
|
---
|
|
|
|
#### check.auth.ldap.groups
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.auth.ldap.groups` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `auth`, `ldap` |
|
|
| **What it verifies** | LDAP group mapping is configured and working |
|
|
| **Evidence collected** | Group attribute, mapped groups, sample user groups |
|
|
| **Failure modes** | Group attribute not found, no groups mapped |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check group attribute configuration
|
|
# etc/authority.plugins/ldap.yaml: claims.groupAttribute
|
|
|
|
# 2. Test group lookup for a sample user
|
|
ldapsearch -x -H ldaps://{LDAP_HOST}:636 \
|
|
-D "{BIND_DN}" -w "{PASSWORD}" \
|
|
-b "{SEARCH_BASE}" "(uid={TEST_USER})" memberOf
|
|
|
|
# 3. Verify group mapping in Authority
|
|
stella authority ldap-test --user {TEST_USER} --show-groups
|
|
|
|
# 4. Update group attribute if needed
|
|
# Common attributes: memberOf, member, groupMembership
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.auth.ldap.groups`
|
|
|
|
---
|
|
|
|
#### check.tls.certificates.expiry
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.tls.certificates.expiry` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Warn (30d), Fail (7d) |
|
|
| **Tags** | `quick`, `security`, `tls` |
|
|
| **What it verifies** | TLS certificates are not expiring soon |
|
|
| **Evidence collected** | Certificate path, subject, expiry date, days remaining |
|
|
| **Failure modes** | Certificate expired, expiring within threshold |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check certificate expiry
|
|
openssl x509 -in /etc/ssl/certs/stellaops.crt -noout -enddate
|
|
|
|
# 2. Renew with certbot (if using Let's Encrypt)
|
|
sudo certbot renew --cert-name stellaops.example.com
|
|
|
|
# 3. Renew manually (if self-signed or enterprise CA)
|
|
# Generate new CSR
|
|
openssl req -new -key /etc/ssl/private/stellaops.key \
|
|
-out /tmp/stellaops.csr -subj "/CN=stellaops.example.com"
|
|
|
|
# Submit CSR to CA and install new certificate
|
|
|
|
# 4. Restart services to pick up new certificate
|
|
sudo systemctl restart stellaops-gateway
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.tls.certificates.expiry`
|
|
|
|
---
|
|
|
|
#### check.tls.certificates.chain
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.tls.certificates.chain` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `security`, `tls` |
|
|
| **What it verifies** | TLS certificate chain is complete and valid |
|
|
| **Evidence collected** | Certificate chain, validation errors |
|
|
| **Failure modes** | Missing intermediate, self-signed not trusted, chain broken |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Verify certificate chain
|
|
openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt \
|
|
/etc/ssl/certs/stellaops.crt
|
|
|
|
# 2. Check chain with openssl
|
|
openssl s_client -connect stellaops.example.com:443 \
|
|
-servername stellaops.example.com -showcerts
|
|
|
|
# 3. Download missing intermediate certificates
|
|
# From your CA's website
|
|
|
|
# 4. Concatenate certificates in correct order
|
|
cat stellaops.crt intermediate.crt > stellaops-fullchain.crt
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.tls.certificates.chain`
|
|
|
|
---
|
|
|
|
#### check.secrets.vault.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.secrets.vault.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `security`, `vault` |
|
|
| **What it verifies** | Vault service is reachable |
|
|
| **Evidence collected** | Vault address, seal status, version |
|
|
| **Failure modes** | Vault unreachable, sealed, version mismatch |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check Vault status
|
|
vault status
|
|
|
|
# 2. If sealed, unseal Vault
|
|
vault operator unseal {UNSEAL_KEY_1}
|
|
vault operator unseal {UNSEAL_KEY_2}
|
|
vault operator unseal {UNSEAL_KEY_3}
|
|
|
|
# 3. Check network connectivity
|
|
curl -s ${VAULT_ADDR}/v1/sys/health | jq
|
|
|
|
# 4. Verify VAULT_ADDR environment variable
|
|
echo $VAULT_ADDR
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.secrets.vault.connectivity`
|
|
|
|
---
|
|
|
|
#### check.secrets.vault.auth
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.secrets.vault.auth` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `security`, `vault` |
|
|
| **What it verifies** | Vault authentication is successful |
|
|
| **Evidence collected** | Auth method, token TTL, policies |
|
|
| **Failure modes** | Invalid token, expired token, wrong auth method |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current token
|
|
vault token lookup
|
|
|
|
# 2. If token expired, authenticate again
|
|
# Token auth:
|
|
vault login {TOKEN}
|
|
|
|
# AppRole auth:
|
|
vault write auth/approle/login role_id={ROLE_ID} secret_id={SECRET_ID}
|
|
|
|
# Kubernetes auth:
|
|
vault write auth/kubernetes/login role=stellaops jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token
|
|
|
|
# 3. Verify authentication worked
|
|
vault token lookup
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.secrets.vault.auth`
|
|
|
|
---
|
|
|
|
#### check.secrets.vault.paths
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.secrets.vault.paths` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `security`, `vault` |
|
|
| **What it verifies** | Required secret paths are accessible |
|
|
| **Evidence collected** | Checked paths, accessible paths, denied paths |
|
|
| **Failure modes** | Permission denied, path not found |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test reading required secrets
|
|
vault kv get secret/data/stellaops/api-keys
|
|
|
|
# 2. Check policy permissions
|
|
vault token lookup -format=json | jq '.data.policies'
|
|
|
|
# 3. Review policy rules
|
|
vault policy read stellaops
|
|
|
|
# 4. Update policy if needed
|
|
vault policy write stellaops - <<EOF
|
|
path "secret/data/stellaops/*" {
|
|
capabilities = ["read", "list"]
|
|
}
|
|
EOF
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.secrets.vault.paths`
|
|
|
|
---
|
|
|
|
#### check.security.evidence.integrity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.security.evidence.integrity` |
|
|
| **Plugin** | `stellaops.doctor.security` |
|
|
| **Category** | Security |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `security`, `evidence`, `integrity`, `dsse`, `rekor`, `offline` |
|
|
| **What it verifies** | Evidence files have valid DSSE signatures, Rekor inclusion proofs, and consistent hashes |
|
|
| **Evidence collected** | Evidence locker path, total files, valid/invalid/skipped counts, specific issues |
|
|
| **Failure modes** | Empty DSSE payload, missing signatures, invalid base64, missing Rekor UUID, missing inclusion proof hashes, digest mismatch |
|
|
|
|
**What it checks:**
|
|
1. **DSSE Envelope Structure**: Validates `payloadType`, `payload` (base64), and `signatures` array
|
|
2. **Signature Completeness**: Each signature has `keyid` and valid base64 `sig`
|
|
3. **Payload Digest Consistency**: If `payloadDigest` field present, recomputes and compares SHA-256
|
|
4. **Evidence Bundle Structure**: Validates `bundleId`, `manifest.version`, and optional `contentDigest`
|
|
5. **Rekor Receipt Validity**: If present, validates `uuid`, `logIndex`, and `inclusionProof.hashes`
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. List evidence files with issues
|
|
stella doctor --check check.security.evidence.integrity --output json \
|
|
| jq '.evidence.issues[]'
|
|
|
|
# 2. Re-sign affected evidence bundles
|
|
stella evidence resign --bundle-id {BUNDLE_ID}
|
|
|
|
# 3. Verify Rekor inclusion manually (if online)
|
|
rekor-cli get --uuid {REKOR_UUID} --format json | jq
|
|
|
|
# 4. For offline environments, verify against local ledger
|
|
stella evidence verify --offline --bundle-id {BUNDLE_ID}
|
|
|
|
# 5. Re-generate evidence pack from source
|
|
stella export evidence-pack --artifact {ARTIFACT_DIGEST} --force
|
|
```
|
|
|
|
**Configuration:**
|
|
```yaml
|
|
# etc/appsettings.yaml
|
|
EvidenceLocker:
|
|
LocalPath: /var/lib/stellaops/evidence
|
|
# Or use Evidence:BasePath for alternate key
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.security.evidence.integrity`
|
|
|
|
---
|
|
|
|
### 9.5 Integration Plugins - SCM (`stellaops.doctor.integration.scm.*`)
|
|
|
|
#### check.integration.scm.github.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.scm.github.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.integration.scm.github` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `scm`, `github` |
|
|
| **What it verifies** | GitHub API is reachable |
|
|
| **Evidence collected** | API endpoint, response status, latency |
|
|
| **Failure modes** | API unreachable, DNS resolution failed, TLS error |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test GitHub API connectivity
|
|
curl -s https://api.github.com/zen
|
|
|
|
# 2. Check DNS resolution
|
|
nslookup api.github.com
|
|
|
|
# 3. Test with authentication
|
|
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user
|
|
|
|
# 4. Check proxy settings if behind firewall
|
|
echo $HTTPS_PROXY
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.scm.github.connectivity`
|
|
|
|
---
|
|
|
|
#### check.integration.scm.github.auth
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.scm.github.auth` |
|
|
| **Plugin** | `stellaops.doctor.integration.scm.github` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `scm`, `github`, `auth` |
|
|
| **What it verifies** | GitHub authentication is successful |
|
|
| **Evidence collected** | Auth type (PAT/App/OAuth), user/app info |
|
|
| **Failure modes** | Invalid token, expired token, wrong app credentials |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# For Personal Access Token:
|
|
# 1. Verify token is valid
|
|
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | jq '.login'
|
|
|
|
# 2. Generate new token if expired
|
|
# Visit: https://github.com/settings/tokens
|
|
|
|
# For GitHub App:
|
|
# 1. Check app installation
|
|
curl -s -H "Authorization: Bearer {JWT}" \
|
|
-H "Accept: application/vnd.github+json" \
|
|
https://api.github.com/app
|
|
|
|
# 2. Verify app is installed on repository
|
|
curl -s -H "Authorization: Bearer {INSTALLATION_TOKEN}" \
|
|
https://api.github.com/installation/repositories
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.scm.github.auth`
|
|
|
|
---
|
|
|
|
#### check.integration.scm.github.permissions
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.scm.github.permissions` |
|
|
| **Plugin** | `stellaops.doctor.integration.scm.github` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `scm`, `github` |
|
|
| **What it verifies** | Token/App has required scopes/permissions |
|
|
| **Evidence collected** | Current scopes, required scopes, missing scopes |
|
|
| **Failure modes** | Missing `repo` scope, missing `write:packages` |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current token scopes
|
|
curl -sI -H "Authorization: Bearer {TOKEN}" https://api.github.com/user | grep x-oauth-scopes
|
|
|
|
# Required scopes for Stella Ops:
|
|
# - repo (full repository access)
|
|
# - read:org (organization membership)
|
|
# - write:packages (container registry)
|
|
|
|
# 2. Generate new token with correct scopes
|
|
# Visit: https://github.com/settings/tokens/new
|
|
# Select: repo, read:org, write:packages
|
|
|
|
# 3. Update token in Stella Ops
|
|
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.scm.github.permissions`
|
|
|
|
---
|
|
|
|
#### check.integration.scm.github.ratelimit
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.scm.github.ratelimit` |
|
|
| **Plugin** | `stellaops.doctor.integration.scm.github` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `integration`, `scm`, `github` |
|
|
| **What it verifies** | GitHub API rate limit is not exhausted |
|
|
| **Evidence collected** | Limit, remaining, reset time |
|
|
| **Failure modes** | Rate limit exhausted, near threshold |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check current rate limit status
|
|
curl -s -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit | jq
|
|
|
|
# 2. If exhausted, wait for reset
|
|
# The "reset" field shows Unix timestamp when limit resets
|
|
|
|
# 3. Consider using GitHub App instead of PAT for higher limits
|
|
# PAT: 5000 requests/hour
|
|
# GitHub App: 15000 requests/hour per installation
|
|
|
|
# 4. Implement request caching in your application
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.scm.github.ratelimit`
|
|
|
|
---
|
|
|
|
#### check.integration.scm.gitlab.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.scm.gitlab.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.integration.scm.gitlab` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `scm`, `gitlab` |
|
|
| **What it verifies** | GitLab API is reachable |
|
|
| **Evidence collected** | API endpoint, response status, version |
|
|
| **Failure modes** | API unreachable, self-hosted instance down |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test GitLab API connectivity
|
|
curl -s https://{GITLAB_HOST}/api/v4/version
|
|
|
|
# 2. For self-hosted GitLab, check service status
|
|
sudo gitlab-ctl status
|
|
|
|
# 3. Check firewall/proxy
|
|
curl -v https://{GITLAB_HOST}/api/v4/version
|
|
|
|
# 4. Verify URL configuration
|
|
stella integrations show --id {INTEGRATION_ID}
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.scm.gitlab.connectivity`
|
|
|
|
---
|
|
|
|
#### check.integration.scm.gitlab.auth
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.scm.gitlab.auth` |
|
|
| **Plugin** | `stellaops.doctor.integration.scm.gitlab` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `scm`, `gitlab`, `auth` |
|
|
| **What it verifies** | GitLab authentication is successful |
|
|
| **Evidence collected** | Auth type, user info, token expiry |
|
|
| **Failure modes** | Invalid token, expired token, revoked access |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test token authentication
|
|
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/user | jq '.username'
|
|
|
|
# 2. Check token expiry
|
|
curl -s -H "PRIVATE-TOKEN: {TOKEN}" https://{GITLAB_HOST}/api/v4/personal_access_tokens/self | jq '.expires_at'
|
|
|
|
# 3. Generate new token if expired
|
|
# Visit: https://{GITLAB_HOST}/-/profile/personal_access_tokens
|
|
|
|
# 4. Update token in Stella Ops
|
|
stella integrations update --id {INTEGRATION_ID} --secret {NEW_TOKEN}
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.scm.gitlab.auth`
|
|
|
|
---
|
|
|
|
### 9.6 Integration Plugins - Registry (`stellaops.doctor.integration.registry.*`)
|
|
|
|
#### check.integration.registry.harbor.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.registry.harbor.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.integration.registry.harbor` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `registry`, `harbor` |
|
|
| **What it verifies** | Harbor registry is reachable |
|
|
| **Evidence collected** | Registry URL, health status, version |
|
|
| **Failure modes** | Registry unreachable, components unhealthy |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check Harbor health endpoint
|
|
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq
|
|
|
|
# 2. Check individual components
|
|
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq '.components'
|
|
|
|
# 3. For self-hosted Harbor, check services
|
|
docker compose -f /opt/harbor/docker-compose.yml ps
|
|
|
|
# 4. Check Harbor logs
|
|
docker compose -f /opt/harbor/docker-compose.yml logs --tail=50 core
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.registry.harbor.connectivity`
|
|
|
|
---
|
|
|
|
#### check.integration.registry.harbor.auth
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.registry.harbor.auth` |
|
|
| **Plugin** | `stellaops.doctor.integration.registry.harbor` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `registry`, `harbor`, `auth` |
|
|
| **What it verifies** | Harbor authentication is successful |
|
|
| **Evidence collected** | Auth type, user info, project access |
|
|
| **Failure modes** | Invalid credentials, LDAP sync issue |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test Docker login
|
|
docker login {HARBOR_HOST} -u {USERNAME} -p {PASSWORD}
|
|
|
|
# 2. Test API authentication
|
|
curl -s -u {USERNAME}:{PASSWORD} https://{HARBOR_HOST}/api/v2.0/users/current | jq
|
|
|
|
# 3. Check if user exists
|
|
curl -s -u admin:{ADMIN_PASSWORD} https://{HARBOR_HOST}/api/v2.0/users?username={USERNAME} | jq
|
|
|
|
# 4. Reset password if needed
|
|
# Via Harbor UI: https://{HARBOR_HOST}/harbor/users
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.registry.harbor.auth`
|
|
|
|
---
|
|
|
|
#### check.integration.registry.harbor.pull
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.registry.harbor.pull` |
|
|
| **Plugin** | `stellaops.doctor.integration.registry.harbor` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `registry`, `harbor` |
|
|
| **What it verifies** | Can pull images from configured repositories |
|
|
| **Evidence collected** | Test image, pull result, error message |
|
|
| **Failure modes** | Permission denied, repository not found |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Test image pull
|
|
docker pull {HARBOR_HOST}/{PROJECT}/{IMAGE}:{TAG}
|
|
|
|
# 2. Check project membership
|
|
curl -s -u {USERNAME}:{PASSWORD} \
|
|
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members | jq
|
|
|
|
# 3. Add user to project if needed
|
|
curl -X POST -u admin:{ADMIN_PASSWORD} \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"role_id": 2, "member_user": {"username": "{USERNAME}"}}' \
|
|
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/members
|
|
|
|
# 4. Verify repository exists
|
|
curl -s -u {USERNAME}:{PASSWORD} \
|
|
https://{HARBOR_HOST}/api/v2.0/projects/{PROJECT}/repositories | jq
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.registry.harbor.pull`
|
|
|
|
---
|
|
|
|
#### check.integration.registry.ecr.connectivity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.registry.ecr.connectivity` |
|
|
| **Plugin** | `stellaops.doctor.integration.registry.ecr` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `registry`, `ecr`, `aws` |
|
|
| **What it verifies** | AWS ECR is reachable |
|
|
| **Evidence collected** | Registry URL, AWS region, endpoint status |
|
|
| **Failure modes** | AWS credentials invalid, region mismatch |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Verify AWS credentials
|
|
aws sts get-caller-identity
|
|
|
|
# 2. Test ECR describe repositories
|
|
aws ecr describe-repositories --region {REGION}
|
|
|
|
# 3. Get ECR login token
|
|
aws ecr get-login-password --region {REGION} | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com
|
|
|
|
# 4. Check AWS credentials configuration
|
|
cat ~/.aws/credentials
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.registry.ecr.connectivity`
|
|
|
|
---
|
|
|
|
#### check.integration.registry.ecr.pull
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.integration.registry.ecr.pull` |
|
|
| **Plugin** | `stellaops.doctor.integration.registry.ecr` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `integration`, `registry`, `ecr`, `aws` |
|
|
| **What it verifies** | Can pull images from ECR repositories |
|
|
| **Evidence collected** | Repository, IAM permissions, error |
|
|
| **Failure modes** | ecr:GetAuthorizationToken denied, ecr:BatchGetImage denied |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check IAM permissions
|
|
aws iam simulate-principal-policy \
|
|
--policy-source-arn {ROLE_ARN} \
|
|
--action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer
|
|
|
|
# 2. Add required IAM policy
|
|
aws iam put-role-policy --role-name {ROLE_NAME} --policy-name ECRPullAccess --policy-document '{
|
|
"Version": "2012-10-17",
|
|
"Statement": [{
|
|
"Effect": "Allow",
|
|
"Action": [
|
|
"ecr:GetAuthorizationToken",
|
|
"ecr:BatchCheckLayerAvailability",
|
|
"ecr:GetDownloadUrlForLayer",
|
|
"ecr:BatchGetImage"
|
|
],
|
|
"Resource": "*"
|
|
}]
|
|
}'
|
|
|
|
# 3. Test pull
|
|
docker pull {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{REPO}:{TAG}
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.integration.registry.ecr.pull`
|
|
|
|
---
|
|
|
|
### 9.7 Observability Plugin (`stellaops.doctor.observability`)
|
|
|
|
#### check.telemetry.otlp.endpoint
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.telemetry.otlp.endpoint` |
|
|
| **Plugin** | `stellaops.doctor.observability` |
|
|
| **Category** | Observability |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `observability`, `telemetry` |
|
|
| **What it verifies** | OTLP collector endpoint is reachable |
|
|
| **Evidence collected** | Endpoint URL, response status, protocol |
|
|
| **Failure modes** | Collector unreachable, wrong protocol (gRPC vs HTTP) |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check OTLP endpoint configuration
|
|
echo $OTEL_EXPORTER_OTLP_ENDPOINT
|
|
|
|
# 2. Test HTTP endpoint
|
|
curl -v ${OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces
|
|
|
|
# 3. Test gRPC endpoint
|
|
grpcurl -plaintext {COLLECTOR_HOST}:4317 list
|
|
|
|
# 4. Check collector is running
|
|
# If using OpenTelemetry Collector:
|
|
docker logs otel-collector
|
|
|
|
# 5. Verify collector configuration
|
|
cat /etc/otel-collector/config.yaml
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.telemetry.otlp.endpoint`
|
|
|
|
---
|
|
|
|
#### check.logs.directory.writable
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.logs.directory.writable` |
|
|
| **Plugin** | `stellaops.doctor.observability` |
|
|
| **Category** | Observability |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `quick`, `observability`, `logs` |
|
|
| **What it verifies** | Log directory is writable |
|
|
| **Evidence collected** | Log path, permissions, owner |
|
|
| **Failure modes** | Directory not writable, disk full |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check log directory permissions
|
|
ls -la /var/log/stellaops
|
|
|
|
# 2. Fix ownership
|
|
sudo chown -R stellaops:stellaops /var/log/stellaops
|
|
|
|
# 3. Fix permissions
|
|
sudo chmod 755 /var/log/stellaops
|
|
|
|
# 4. Check disk space
|
|
df -h /var/log/stellaops
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.logs.directory.writable`
|
|
|
|
---
|
|
|
|
#### check.logs.rotation.configured
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.logs.rotation.configured` |
|
|
| **Plugin** | `stellaops.doctor.observability` |
|
|
| **Category** | Observability |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `observability`, `logs` |
|
|
| **What it verifies** | Log rotation is configured |
|
|
| **Evidence collected** | Rotation config path, settings |
|
|
| **Failure modes** | No rotation configured, invalid config |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check if logrotate config exists
|
|
ls -la /etc/logrotate.d/stellaops
|
|
|
|
# 2. Create logrotate configuration
|
|
sudo cat > /etc/logrotate.d/stellaops << 'EOF'
|
|
/var/log/stellaops/*.log {
|
|
daily
|
|
rotate 14
|
|
compress
|
|
delaycompress
|
|
missingok
|
|
notifempty
|
|
create 640 stellaops stellaops
|
|
postrotate
|
|
systemctl reload stellaops-gateway > /dev/null 2>&1 || true
|
|
endscript
|
|
}
|
|
EOF
|
|
|
|
# 3. Test logrotate configuration
|
|
sudo logrotate -d /etc/logrotate.d/stellaops
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.logs.rotation.configured`
|
|
|
|
---
|
|
|
|
#### check.metrics.prometheus.scrape
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.metrics.prometheus.scrape` |
|
|
| **Plugin** | `stellaops.doctor.observability` |
|
|
| **Category** | Observability |
|
|
| **Severity** | Warn |
|
|
| **Tags** | `observability`, `metrics` |
|
|
| **What it verifies** | Prometheus metrics endpoint is accessible |
|
|
| **Evidence collected** | Metrics endpoint, sample metrics count |
|
|
| **Failure modes** | Endpoint not exposed, auth required |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. Check metrics endpoint
|
|
curl -s http://localhost:{PORT}/metrics | head -20
|
|
|
|
# 2. Verify metrics are being scraped
|
|
curl -s http://{PROMETHEUS_HOST}:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job == "stellaops")'
|
|
|
|
# 3. Add Prometheus scrape config
|
|
# In prometheus.yml:
|
|
scrape_configs:
|
|
- job_name: 'stellaops'
|
|
static_configs:
|
|
- targets: ['stellaops-gateway:8080', 'stellaops-concelier:8081']
|
|
|
|
# 4. Reload Prometheus
|
|
curl -X POST http://{PROMETHEUS_HOST}:9090/-/reload
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.metrics.prometheus.scrape`
|
|
|
|
---
|
|
|
|
### 9.8 Release Orchestrator Plugin (`stellaops.doctor.releaseorch`)
|
|
|
|
#### check.releaseorch.environments.configured
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.releaseorch.environments.configured` |
|
|
| **Plugin** | `stellaops.doctor.releaseorch` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `release`, `environments` |
|
|
| **What it verifies** | At least one environment is configured |
|
|
| **Evidence collected** | Environment count, environment names |
|
|
| **Failure modes** | No environments configured |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. List current environments
|
|
stella environments list
|
|
|
|
# 2. Create development environment
|
|
stella environments create \
|
|
--name development \
|
|
--type development \
|
|
--promotion-target staging
|
|
|
|
# 3. Create staging environment
|
|
stella environments create \
|
|
--name staging \
|
|
--type staging \
|
|
--promotion-target production \
|
|
--requires-approval
|
|
|
|
# 4. Create production environment
|
|
stella environments create \
|
|
--name production \
|
|
--type production \
|
|
--requires-approval
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.releaseorch.environments.configured`
|
|
|
|
---
|
|
|
|
#### check.releaseorch.deployments.targets
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **CheckId** | `check.releaseorch.deployments.targets` |
|
|
| **Plugin** | `stellaops.doctor.releaseorch` |
|
|
| **Category** | Integration |
|
|
| **Severity** | Fail |
|
|
| **Tags** | `release`, `deployments` |
|
|
| **What it verifies** | Deployment targets are reachable |
|
|
| **Evidence collected** | Target type, connectivity status, last heartbeat |
|
|
| **Failure modes** | Agent offline, target unreachable |
|
|
|
|
**Remediation:**
|
|
```bash
|
|
# 1. List deployment targets
|
|
stella deployments targets list
|
|
|
|
# 2. Check agent status
|
|
stella deployments targets health --target {TARGET_ID}
|
|
|
|
# 3. Restart agent if needed
|
|
# On target host:
|
|
sudo systemctl restart stellaops-agent
|
|
|
|
# 4. Re-register target if agent was reinstalled
|
|
stella deployments targets register \
|
|
--name {TARGET_NAME} \
|
|
--type docker-compose \
|
|
--endpoint ssh://user@host
|
|
```
|
|
|
|
**Verification:** `stella doctor --check check.releaseorch.deployments.targets`
|
|
|
|
---
|
|
|
|
## 10. Plugin Implementation Details
|
|
|
|
### 10.1 Core Platform Plugin
|
|
|
|
**Location:** `src/__Libraries/StellaOps.Doctor/Plugins/Core/`
|
|
|
|
Provides foundational checks for configuration, runtime, and platform health.
|
|
|
|
**Checks Provided:**
|
|
- `check.config.required`
|
|
- `check.config.syntax`
|
|
- `check.config.deprecated`
|
|
- `check.runtime.dotnet`
|
|
- `check.runtime.memory`
|
|
- `check.runtime.disk.space`
|
|
- `check.runtime.disk.permissions`
|
|
- `check.time.sync`
|
|
- `check.crypto.profiles`
|
|
|
|
**Dependencies:** None (core plugin)
|
|
|
|
---
|
|
|
|
### 10.2 Database & Migrations Plugin
|
|
|
|
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Database/`
|
|
|
|
Provides database connectivity and migration state checks.
|
|
|
|
**References:**
|
|
- `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs`
|
|
- `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationStatusService.cs`
|
|
|
|
**Checks Provided:**
|
|
- `check.database.connectivity`
|
|
- `check.database.version`
|
|
- `check.database.migrations.pending`
|
|
- `check.database.migrations.checksum`
|
|
- `check.database.migrations.lock`
|
|
- `check.database.schema.{schema}` (dynamic per schema)
|
|
- `check.database.connections.pool`
|
|
|
|
**Configuration:**
|
|
```yaml
|
|
Doctor:
|
|
Plugins:
|
|
Database:
|
|
Enabled: true
|
|
ConnectionTimeout: 10s
|
|
Schemas:
|
|
- auth
|
|
- vuln
|
|
- scanner
|
|
- orchestrator
|
|
```
|
|
|
|
---
|
|
|
|
### 10.3 Service Graph Plugin
|
|
|
|
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ServiceGraph/`
|
|
|
|
Validates inter-service connectivity via Gateway and Router.
|
|
|
|
**References:**
|
|
- `src/Gateway/StellaOps.Gateway.WebService/Middleware/RequestRoutingMiddleware.cs`
|
|
- `src/Router/__Libraries/StellaOps.Router.Gateway/Services/ConnectionManager.cs`
|
|
|
|
**Checks Provided:**
|
|
- `check.services.gateway.running`
|
|
- `check.services.gateway.routing`
|
|
- `check.services.{service}.health` (dynamic per service)
|
|
- `check.services.{service}.connectivity` (dynamic per service)
|
|
- `check.services.authority.connectivity`
|
|
|
|
**Configuration:**
|
|
```yaml
|
|
Doctor:
|
|
Plugins:
|
|
ServiceGraph:
|
|
Enabled: true
|
|
HealthEndpointTimeout: 5s
|
|
Services:
|
|
- name: concelier
|
|
port: 8081
|
|
- name: scanner
|
|
port: 8082
|
|
- name: attestor
|
|
port: 8083
|
|
```
|
|
|
|
---
|
|
|
|
### 10.4 Security Plugin
|
|
|
|
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Security/`
|
|
|
|
Validates authentication, authorization, TLS, and secrets management.
|
|
|
|
**References:**
|
|
- `src/Authority/StellaOps.Authority/StellaOps.Authority.Plugin.Ldap/`
|
|
- `src/ReleaseOrchestrator/__Libraries/.../Connectors/Vault/HashiCorpVaultConnector.cs`
|
|
|
|
**Checks Provided:**
|
|
- `check.auth.oidc.discovery`
|
|
- `check.auth.oidc.jwks`
|
|
- `check.auth.ldap.bind`
|
|
- `check.auth.ldap.search`
|
|
- `check.auth.ldap.groups`
|
|
- `check.tls.certificates.expiry`
|
|
- `check.tls.certificates.chain`
|
|
- `check.secrets.vault.connectivity`
|
|
- `check.secrets.vault.auth`
|
|
- `check.secrets.vault.paths`
|
|
|
|
---
|
|
|
|
### 10.5 SCM Integration Plugins
|
|
|
|
**GitHub Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitHub/`
|
|
**GitLab Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scm.GitLab/`
|
|
|
|
**References:**
|
|
- `src/Integrations/__Plugins/StellaOps.Integrations.Plugin.GitHubApp/`
|
|
- `etc/scm-connectors/github.yaml`
|
|
|
|
**GitHub Checks:**
|
|
- `check.integration.scm.github.connectivity`
|
|
- `check.integration.scm.github.auth`
|
|
- `check.integration.scm.github.permissions`
|
|
- `check.integration.scm.github.ratelimit`
|
|
|
|
**GitLab Checks:**
|
|
- `check.integration.scm.gitlab.connectivity`
|
|
- `check.integration.scm.gitlab.auth`
|
|
- `check.integration.scm.gitlab.permissions`
|
|
|
|
---
|
|
|
|
### 10.6 Registry Integration Plugins
|
|
|
|
**Harbor Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.Harbor/`
|
|
**ECR Plugin Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Registry.ECR/`
|
|
|
|
**References:**
|
|
- `src/Integrations/__Plugins/StellaOps.Integrations.Plugin.Harbor/`
|
|
|
|
**Harbor Checks:**
|
|
- `check.integration.registry.harbor.connectivity`
|
|
- `check.integration.registry.harbor.auth`
|
|
- `check.integration.registry.harbor.pull`
|
|
|
|
**ECR Checks:**
|
|
- `check.integration.registry.ecr.connectivity`
|
|
- `check.integration.registry.ecr.pull`
|
|
|
|
---
|
|
|
|
### 10.7 Observability Plugin
|
|
|
|
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Observability/`
|
|
|
|
**References:**
|
|
- `devops/telemetry/otel-collector.yaml`
|
|
|
|
**Checks Provided:**
|
|
- `check.telemetry.otlp.endpoint`
|
|
- `check.logs.directory.writable`
|
|
- `check.logs.rotation.configured`
|
|
- `check.metrics.prometheus.scrape`
|
|
|
|
---
|
|
|
|
### 10.8 Release Orchestrator Plugin
|
|
|
|
**Location:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.ReleaseOrch/`
|
|
|
|
**References:**
|
|
- `src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/Doctor/`
|
|
|
|
**Checks Provided:**
|
|
- `check.releaseorch.environments.configured`
|
|
- `check.releaseorch.deployments.targets`
|
|
|
|
---
|
|
|
|
## Appendix A: Complete Check ID Reference
|
|
|
|
| CheckId | Plugin | Category | Default Severity |
|
|
|---------|--------|----------|------------------|
|
|
| `check.config.required` | core | Core | Fail |
|
|
| `check.config.syntax` | core | Core | Fail |
|
|
| `check.config.deprecated` | core | Core | Warn |
|
|
| `check.runtime.dotnet` | core | Core | Fail |
|
|
| `check.runtime.memory` | core | Core | Warn |
|
|
| `check.runtime.disk.space` | core | Core | Warn |
|
|
| `check.runtime.disk.permissions` | core | Core | Fail |
|
|
| `check.time.sync` | core | Core | Warn |
|
|
| `check.crypto.profiles` | core | Core | Fail |
|
|
| `check.database.connectivity` | database | Database | Fail |
|
|
| `check.database.version` | database | Database | Warn |
|
|
| `check.database.migrations.pending` | database | Database | Fail |
|
|
| `check.database.migrations.checksum` | database | Database | Fail |
|
|
| `check.database.migrations.lock` | database | Database | Warn |
|
|
| `check.database.schema.{schema}` | database | Database | Fail |
|
|
| `check.database.connections.pool` | database | Database | Warn |
|
|
| `check.services.gateway.running` | servicegraph | ServiceGraph | Fail |
|
|
| `check.services.gateway.routing` | servicegraph | ServiceGraph | Fail |
|
|
| `check.services.{service}.health` | servicegraph | ServiceGraph | Fail |
|
|
| `check.services.{service}.connectivity` | servicegraph | ServiceGraph | Fail |
|
|
| `check.services.authority.connectivity` | servicegraph | ServiceGraph | Fail |
|
|
| `check.auth.oidc.discovery` | security | Security | Fail |
|
|
| `check.auth.oidc.jwks` | security | Security | Fail |
|
|
| `check.auth.ldap.bind` | security | Security | Fail |
|
|
| `check.auth.ldap.search` | security | Security | Fail |
|
|
| `check.auth.ldap.groups` | security | Security | Warn |
|
|
| `check.tls.certificates.expiry` | security | Security | Warn/Fail |
|
|
| `check.tls.certificates.chain` | security | Security | Fail |
|
|
| `check.secrets.vault.connectivity` | security | Security | Fail |
|
|
| `check.secrets.vault.auth` | security | Security | Fail |
|
|
| `check.secrets.vault.paths` | security | Security | Fail |
|
|
| `check.integration.scm.github.connectivity` | scm.github | Integration | Fail |
|
|
| `check.integration.scm.github.auth` | scm.github | Integration | Fail |
|
|
| `check.integration.scm.github.permissions` | scm.github | Integration | Fail |
|
|
| `check.integration.scm.github.ratelimit` | scm.github | Integration | Warn |
|
|
| `check.integration.scm.gitlab.connectivity` | scm.gitlab | Integration | Fail |
|
|
| `check.integration.scm.gitlab.auth` | scm.gitlab | Integration | Fail |
|
|
| `check.integration.registry.harbor.connectivity` | registry.harbor | Integration | Fail |
|
|
| `check.integration.registry.harbor.auth` | registry.harbor | Integration | Fail |
|
|
| `check.integration.registry.harbor.pull` | registry.harbor | Integration | Fail |
|
|
| `check.integration.registry.ecr.connectivity` | registry.ecr | Integration | Fail |
|
|
| `check.integration.registry.ecr.pull` | registry.ecr | Integration | Fail |
|
|
| `check.telemetry.otlp.endpoint` | observability | Observability | Warn |
|
|
| `check.logs.directory.writable` | observability | Observability | Fail |
|
|
| `check.logs.rotation.configured` | observability | Observability | Warn |
|
|
| `check.metrics.prometheus.scrape` | observability | Observability | Warn |
|
|
| `check.releaseorch.environments.configured` | releaseorch | Integration | Fail |
|
|
| `check.releaseorch.deployments.targets` | releaseorch | Integration | Fail |
|
|
|
|
---
|
|
|
|
## Appendix B: Quick Reference - Common Issues
|
|
|
|
### Database Issues
|
|
|
|
```bash
|
|
# Connection refused
|
|
sudo systemctl start postgresql
|
|
stella doctor --check check.database.connectivity
|
|
|
|
# Pending migrations
|
|
stella system migrations-run --category release
|
|
stella doctor --check check.database.migrations.pending
|
|
|
|
# Migration lock stuck
|
|
psql -d stellaops -c "SELECT pg_advisory_unlock_all();"
|
|
```
|
|
|
|
### Authentication Issues
|
|
|
|
```bash
|
|
# OIDC discovery fails
|
|
curl -s ${STELLAOPS_AUTHORITY_URL}/.well-known/openid-configuration
|
|
sudo systemctl restart stellaops-authority
|
|
|
|
# LDAP bind fails
|
|
ldapsearch -x -H ldaps://{HOST}:636 -D "{BIND_DN}" -w "{PASSWORD}" -b "" -s base
|
|
```
|
|
|
|
### Integration Issues
|
|
|
|
```bash
|
|
# GitHub rate limit
|
|
curl -H "Authorization: Bearer {TOKEN}" https://api.github.com/rate_limit
|
|
|
|
# Harbor connectivity
|
|
curl -s https://{HARBOR_HOST}/api/v2.0/health | jq
|
|
```
|
|
|
|
---
|
|
|
|
*Document generated: 2026-01-12*
|
|
*Stella Ops Doctor Capability Specification v1.0.0-draft*
|
|
|