9.3 KiB
9.3 KiB
Doctor Architecture
Module: Doctor Sprint: SPRINT_0127_001_0002_oci_registry_compatibility
Stella Doctor is a diagnostic framework for validating system health, configuration, and integration connectivity across the StellaOps platform.
1) Overview
Doctor provides a plugin-based diagnostic system that enables:
- Health checks for all platform components
- Integration validation for external systems (registries, SCM, CI, secrets)
- Configuration verification before deployment
- Capability probing for feature compatibility
- Evidence collection for troubleshooting and compliance
2) Plugin Architecture
Core Interfaces
public interface IDoctorPlugin
{
string PluginId { get; }
string DisplayName { get; }
string Category { get; }
Version Version { get; }
IEnumerable<IDoctorCheck> GetChecks();
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
}
public interface IDoctorCheck
{
string CheckId { get; }
string Name { get; }
string Description { get; }
DoctorSeverity DefaultSeverity { get; }
IReadOnlyList<string> Tags { get; }
TimeSpan EstimatedDuration { get; }
bool CanRun(DoctorPluginContext context);
Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
}
Plugin Context
public sealed class DoctorPluginContext
{
public IServiceProvider Services { get; }
public IConfiguration Configuration { get; }
public TimeProvider TimeProvider { get; }
public ILogger Logger { get; }
public string EnvironmentName { get; }
public IReadOnlyDictionary<string, object> PluginConfig { get; }
}
Check Results
public sealed record CheckResult
{
public DoctorSeverity Severity { get; init; }
public string Diagnosis { get; init; }
public Evidence Evidence { get; init; }
public IReadOnlyList<string> LikelyCauses { get; init; }
public Remediation? Remediation { get; init; }
public string? VerificationCommand { get; init; }
}
public enum DoctorSeverity
{
Pass, // Check succeeded
Info, // Informational (no action needed)
Warn, // Warning (degraded but functional)
Fail, // Failure (requires action)
Skip // Check skipped (preconditions not met)
}
3) Built-in Plugins
IntegrationPlugin
Validates external system connectivity and capabilities.
Check Catalog:
| Check ID | Name | Severity | Description |
|---|---|---|---|
check.integration.oci.credentials |
OCI Registry Credentials | Fail | Validate registry authentication |
check.integration.oci.pull |
OCI Registry Pull Authorization | Fail | Verify pull permissions |
check.integration.oci.push |
OCI Registry Push Authorization | Fail | Verify push permissions |
check.integration.oci.referrers |
OCI Registry Referrers API | Warn | Check OCI 1.1 referrers support |
check.integration.oci.capabilities |
OCI Registry Capability Matrix | Info | Probe all registry capabilities |
See Registry Diagnostic Checks for detailed documentation.
ConfigurationPlugin
Validates platform configuration.
| Check ID | Name | Severity | Description |
|---|---|---|---|
check.config.database |
Database Connection | Fail | Verify database connectivity |
check.config.secrets |
Secrets Provider | Fail | Verify secrets access |
check.config.tls |
TLS Configuration | Warn | Validate TLS certificates |
HealthPlugin
Validates platform component health.
| Check ID | Name | Severity | Description |
|---|---|---|---|
check.health.api |
API Health | Fail | Verify API endpoints |
check.health.worker |
Worker Health | Fail | Verify background workers |
check.health.storage |
Storage Health | Fail | Verify storage backends |
4) Check Patterns
Non-Destructive Probing
Registry checks use non-destructive operations:
// Pull check: HEAD request only (no data transfer)
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, manifestUrl), ct);
// Push check: Start upload then immediately cancel
var uploadResponse = await client.PostAsync(uploadsUrl, null, ct);
if (uploadResponse.StatusCode == HttpStatusCode.Accepted)
{
var location = uploadResponse.Headers.Location;
await client.DeleteAsync(location, ct); // Cancel upload
}
Capability Detection
Registry capability probing sequence:
1. GET /v2/ → Extract OCI-Distribution-API-Version header
2. GET /v2/{repo}/referrers/{digest} → Check referrers API support
3. POST /v2/{repo}/blobs/uploads/ → Check chunked upload support
└─ DELETE {location} → Cancel upload session
4. POST /v2/{repo}/blobs/uploads/?mount=...&from=... → Check cross-repo mount
5. OPTIONS /v2/{repo}/manifests/{ref} → Check delete support (Allow header)
6. OPTIONS /v2/{repo}/blobs/{digest} → Check blob delete support
Evidence Collection
All checks collect structured evidence:
var result = CheckResultBuilder.Create(check)
.Pass("Registry authentication successful")
.WithEvidence(eb => eb
.Add("registry_url", registryUrl)
.Add("auth_method", "bearer")
.Add("response_time_ms", elapsed.TotalMilliseconds.ToString("F0"))
.AddSensitive("token_preview", RedactToken(token)))
.Build();
Credential Redaction
Sensitive values are automatically redacted:
// Redact to first 2 + last 2 characters
private static string Redact(string? value)
{
if (string.IsNullOrEmpty(value) || value.Length <= 4)
return "****";
return $"{value[..2]}...{value[^2..]}";
}
// "mysecretpassword" → "my...rd"
5) CLI Integration
# Run all checks
stella doctor
# Run checks by tag
stella doctor --tag registry
stella doctor --tag configuration
# Run specific check
stella doctor --check check.integration.oci.referrers
# Output formats
stella doctor --format table # Default: human-readable
stella doctor --format json # Machine-readable
stella doctor --format sarif # SARIF for CI integration
# Verbosity
stella doctor --verbose # Include evidence details
stella doctor --quiet # Only show failures
# Filtering by severity
stella doctor --min-severity warn # Skip info/pass
6) Extensibility
Creating a Custom Check
public sealed class MyCustomCheck : IDoctorCheck
{
public string CheckId => "check.custom.mycheck";
public string Name => "My Custom Check";
public string Description => "Validates custom integration";
public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
public IReadOnlyList<string> Tags => ["custom", "integration"];
public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(5);
public bool CanRun(DoctorPluginContext context)
{
// Return false if preconditions not met
return context.Configuration["Custom:Enabled"] == "true";
}
public async Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct)
{
var builder = CheckResultBuilder.Create(this);
try
{
// Perform check logic
var result = await ValidateAsync(context, ct);
if (result.Success)
{
return builder
.Pass("Custom validation successful")
.WithEvidence(eb => eb.Add("detail", result.Detail))
.Build();
}
return builder
.Fail("Custom validation failed")
.WithLikelyCause("Configuration is invalid")
.WithRemediation(rb => rb
.AddManualStep(1, "Check configuration", "Verify Custom:Setting is correct")
.WithRunbookUrl("https://docs.stella-ops.org/runbooks/custom-check"))
.Build();
}
catch (Exception ex)
{
return builder
.Fail($"Check failed with error: {ex.Message}")
.WithEvidence(eb => eb.Add("exception_type", ex.GetType().Name))
.Build();
}
}
}
Creating a Custom Plugin
public sealed class MyCustomPlugin : IDoctorPlugin
{
public string PluginId => "custom";
public string DisplayName => "Custom Checks";
public string Category => "Integration";
public Version Version => new(1, 0, 0);
public IEnumerable<IDoctorCheck> GetChecks()
{
yield return new MyCustomCheck();
yield return new AnotherCustomCheck();
}
public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
{
// Optional initialization
return Task.CompletedTask;
}
}
7) Telemetry
Doctor emits metrics and traces for observability:
Metrics:
doctor_check_duration_seconds{check_id, severity}- Check execution timedoctor_check_results_total{check_id, severity}- Result countsdoctor_plugin_load_duration_seconds{plugin_id}- Plugin initialization time
Traces:
doctor.run- Full doctor run spandoctor.check.{check_id}- Individual check spans with evidence as attributes