Files
git.stella-ops.org/docs/modules/doctor/architecture.md
2026-01-28 02:30:48 +02:00

9.3 KiB

Doctor Architecture

Module: Doctor Sprint: SPRINT_0127_001_0002_oci_registry_compatibility

Stella Doctor is a diagnostic framework for validating system health, configuration, and integration connectivity across the StellaOps platform.

1) Overview

Doctor provides a plugin-based diagnostic system that enables:

  • Health checks for all platform components
  • Integration validation for external systems (registries, SCM, CI, secrets)
  • Configuration verification before deployment
  • Capability probing for feature compatibility
  • Evidence collection for troubleshooting and compliance

2) Plugin Architecture

Core Interfaces

public interface IDoctorPlugin
{
    string PluginId { get; }
    string DisplayName { get; }
    string Category { get; }
    Version Version { get; }

    IEnumerable<IDoctorCheck> GetChecks();
    Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
}

public interface IDoctorCheck
{
    string CheckId { get; }
    string Name { get; }
    string Description { get; }
    DoctorSeverity DefaultSeverity { get; }
    IReadOnlyList<string> Tags { get; }
    TimeSpan EstimatedDuration { get; }

    bool CanRun(DoctorPluginContext context);
    Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
}

Plugin Context

public sealed class DoctorPluginContext
{
    public IServiceProvider Services { get; }
    public IConfiguration Configuration { get; }
    public TimeProvider TimeProvider { get; }
    public ILogger Logger { get; }
    public string EnvironmentName { get; }
    public IReadOnlyDictionary<string, object> PluginConfig { get; }
}

Check Results

public sealed record CheckResult
{
    public DoctorSeverity Severity { get; init; }
    public string Diagnosis { get; init; }
    public Evidence Evidence { get; init; }
    public IReadOnlyList<string> LikelyCauses { get; init; }
    public Remediation? Remediation { get; init; }
    public string? VerificationCommand { get; init; }
}

public enum DoctorSeverity
{
    Pass,    // Check succeeded
    Info,    // Informational (no action needed)
    Warn,    // Warning (degraded but functional)
    Fail,    // Failure (requires action)
    Skip     // Check skipped (preconditions not met)
}

3) Built-in Plugins

IntegrationPlugin

Validates external system connectivity and capabilities.

Check Catalog:

Check ID Name Severity Description
check.integration.oci.credentials OCI Registry Credentials Fail Validate registry authentication
check.integration.oci.pull OCI Registry Pull Authorization Fail Verify pull permissions
check.integration.oci.push OCI Registry Push Authorization Fail Verify push permissions
check.integration.oci.referrers OCI Registry Referrers API Warn Check OCI 1.1 referrers support
check.integration.oci.capabilities OCI Registry Capability Matrix Info Probe all registry capabilities

See Registry Diagnostic Checks for detailed documentation.

ConfigurationPlugin

Validates platform configuration.

Check ID Name Severity Description
check.config.database Database Connection Fail Verify database connectivity
check.config.secrets Secrets Provider Fail Verify secrets access
check.config.tls TLS Configuration Warn Validate TLS certificates

HealthPlugin

Validates platform component health.

Check ID Name Severity Description
check.health.api API Health Fail Verify API endpoints
check.health.worker Worker Health Fail Verify background workers
check.health.storage Storage Health Fail Verify storage backends

4) Check Patterns

Non-Destructive Probing

Registry checks use non-destructive operations:

// Pull check: HEAD request only (no data transfer)
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, manifestUrl), ct);

// Push check: Start upload then immediately cancel
var uploadResponse = await client.PostAsync(uploadsUrl, null, ct);
if (uploadResponse.StatusCode == HttpStatusCode.Accepted)
{
    var location = uploadResponse.Headers.Location;
    await client.DeleteAsync(location, ct); // Cancel upload
}

Capability Detection

Registry capability probing sequence:

1. GET /v2/ → Extract OCI-Distribution-API-Version header
2. GET /v2/{repo}/referrers/{digest} → Check referrers API support
3. POST /v2/{repo}/blobs/uploads/ → Check chunked upload support
   └─ DELETE {location} → Cancel upload session
4. POST /v2/{repo}/blobs/uploads/?mount=...&from=... → Check cross-repo mount
5. OPTIONS /v2/{repo}/manifests/{ref} → Check delete support (Allow header)
6. OPTIONS /v2/{repo}/blobs/{digest} → Check blob delete support

Evidence Collection

All checks collect structured evidence:

var result = CheckResultBuilder.Create(check)
    .Pass("Registry authentication successful")
    .WithEvidence(eb => eb
        .Add("registry_url", registryUrl)
        .Add("auth_method", "bearer")
        .Add("response_time_ms", elapsed.TotalMilliseconds.ToString("F0"))
        .AddSensitive("token_preview", RedactToken(token)))
    .Build();

Credential Redaction

Sensitive values are automatically redacted:

// Redact to first 2 + last 2 characters
private static string Redact(string? value)
{
    if (string.IsNullOrEmpty(value) || value.Length <= 4)
        return "****";
    return $"{value[..2]}...{value[^2..]}";
}
// "mysecretpassword" → "my...rd"

5) CLI Integration

# Run all checks
stella doctor

# Run checks by tag
stella doctor --tag registry
stella doctor --tag configuration

# Run specific check
stella doctor --check check.integration.oci.referrers

# Output formats
stella doctor --format table    # Default: human-readable
stella doctor --format json     # Machine-readable
stella doctor --format sarif    # SARIF for CI integration

# Verbosity
stella doctor --verbose         # Include evidence details
stella doctor --quiet           # Only show failures

# Filtering by severity
stella doctor --min-severity warn  # Skip info/pass

6) Extensibility

Creating a Custom Check

public sealed class MyCustomCheck : IDoctorCheck
{
    public string CheckId => "check.custom.mycheck";
    public string Name => "My Custom Check";
    public string Description => "Validates custom integration";
    public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
    public IReadOnlyList<string> Tags => ["custom", "integration"];
    public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(5);

    public bool CanRun(DoctorPluginContext context)
    {
        // Return false if preconditions not met
        return context.Configuration["Custom:Enabled"] == "true";
    }

    public async Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct)
    {
        var builder = CheckResultBuilder.Create(this);

        try
        {
            // Perform check logic
            var result = await ValidateAsync(context, ct);

            if (result.Success)
            {
                return builder
                    .Pass("Custom validation successful")
                    .WithEvidence(eb => eb.Add("detail", result.Detail))
                    .Build();
            }

            return builder
                .Fail("Custom validation failed")
                .WithLikelyCause("Configuration is invalid")
                .WithRemediation(rb => rb
                    .AddManualStep(1, "Check configuration", "Verify Custom:Setting is correct")
                    .WithRunbookUrl("https://docs.stella-ops.org/runbooks/custom-check"))
                .Build();
        }
        catch (Exception ex)
        {
            return builder
                .Fail($"Check failed with error: {ex.Message}")
                .WithEvidence(eb => eb.Add("exception_type", ex.GetType().Name))
                .Build();
        }
    }
}

Creating a Custom Plugin

public sealed class MyCustomPlugin : IDoctorPlugin
{
    public string PluginId => "custom";
    public string DisplayName => "Custom Checks";
    public string Category => "Integration";
    public Version Version => new(1, 0, 0);

    public IEnumerable<IDoctorCheck> GetChecks()
    {
        yield return new MyCustomCheck();
        yield return new AnotherCustomCheck();
    }

    public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
    {
        // Optional initialization
        return Task.CompletedTask;
    }
}

7) Telemetry

Doctor emits metrics and traces for observability:

Metrics:

  • doctor_check_duration_seconds{check_id, severity} - Check execution time
  • doctor_check_results_total{check_id, severity} - Result counts
  • doctor_plugin_load_duration_seconds{plugin_id} - Plugin initialization time

Traces:

  • doctor.run - Full doctor run span
  • doctor.check.{check_id} - Individual check spans with evidence as attributes