309 lines
9.3 KiB
Markdown
309 lines
9.3 KiB
Markdown
# Doctor Architecture
|
|
|
|
> Module: Doctor
|
|
> Sprint: SPRINT_0127_001_0002_oci_registry_compatibility
|
|
|
|
Stella Doctor is a diagnostic framework for validating system health, configuration, and integration connectivity across the StellaOps platform.
|
|
|
|
## 1) Overview
|
|
|
|
Doctor provides a plugin-based diagnostic system that enables:
|
|
- **Health checks** for all platform components
|
|
- **Integration validation** for external systems (registries, SCM, CI, secrets)
|
|
- **Configuration verification** before deployment
|
|
- **Capability probing** for feature compatibility
|
|
- **Evidence collection** for troubleshooting and compliance
|
|
|
|
## 2) Plugin Architecture
|
|
|
|
### Core Interfaces
|
|
|
|
```csharp
|
|
public interface IDoctorPlugin
|
|
{
|
|
string PluginId { get; }
|
|
string DisplayName { get; }
|
|
string Category { get; }
|
|
Version Version { get; }
|
|
|
|
IEnumerable<IDoctorCheck> GetChecks();
|
|
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
|
|
}
|
|
|
|
public interface IDoctorCheck
|
|
{
|
|
string CheckId { get; }
|
|
string Name { get; }
|
|
string Description { get; }
|
|
DoctorSeverity DefaultSeverity { get; }
|
|
IReadOnlyList<string> Tags { get; }
|
|
TimeSpan EstimatedDuration { get; }
|
|
|
|
bool CanRun(DoctorPluginContext context);
|
|
Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
|
|
}
|
|
```
|
|
|
|
### Plugin Context
|
|
|
|
```csharp
|
|
public sealed class DoctorPluginContext
|
|
{
|
|
public IServiceProvider Services { get; }
|
|
public IConfiguration Configuration { get; }
|
|
public TimeProvider TimeProvider { get; }
|
|
public ILogger Logger { get; }
|
|
public string EnvironmentName { get; }
|
|
public IReadOnlyDictionary<string, object> PluginConfig { get; }
|
|
}
|
|
```
|
|
|
|
### Check Results
|
|
|
|
```csharp
|
|
public sealed record CheckResult
|
|
{
|
|
public DoctorSeverity Severity { get; init; }
|
|
public string Diagnosis { get; init; }
|
|
public Evidence Evidence { get; init; }
|
|
public IReadOnlyList<string> LikelyCauses { get; init; }
|
|
public Remediation? Remediation { get; init; }
|
|
public string? VerificationCommand { get; init; }
|
|
}
|
|
|
|
public enum DoctorSeverity
|
|
{
|
|
Pass, // Check succeeded
|
|
Info, // Informational (no action needed)
|
|
Warn, // Warning (degraded but functional)
|
|
Fail, // Failure (requires action)
|
|
Skip // Check skipped (preconditions not met)
|
|
}
|
|
```
|
|
|
|
## 3) Built-in Plugins
|
|
|
|
### IntegrationPlugin
|
|
|
|
Validates external system connectivity and capabilities.
|
|
|
|
**Check Catalog:**
|
|
|
|
| Check ID | Name | Severity | Description |
|
|
|----------|------|----------|-------------|
|
|
| `check.integration.oci.credentials` | OCI Registry Credentials | Fail | Validate registry authentication |
|
|
| `check.integration.oci.pull` | OCI Registry Pull Authorization | Fail | Verify pull permissions |
|
|
| `check.integration.oci.push` | OCI Registry Push Authorization | Fail | Verify push permissions |
|
|
| `check.integration.oci.referrers` | OCI Registry Referrers API | Warn | Check OCI 1.1 referrers support |
|
|
| `check.integration.oci.capabilities` | OCI Registry Capability Matrix | Info | Probe all registry capabilities |
|
|
|
|
See [Registry Diagnostic Checks](./registry-checks.md) for detailed documentation.
|
|
|
|
### ConfigurationPlugin
|
|
|
|
Validates platform configuration.
|
|
|
|
| Check ID | Name | Severity | Description |
|
|
|----------|------|----------|-------------|
|
|
| `check.config.database` | Database Connection | Fail | Verify database connectivity |
|
|
| `check.config.secrets` | Secrets Provider | Fail | Verify secrets access |
|
|
| `check.config.tls` | TLS Configuration | Warn | Validate TLS certificates |
|
|
|
|
### HealthPlugin
|
|
|
|
Validates platform component health.
|
|
|
|
| Check ID | Name | Severity | Description |
|
|
|----------|------|----------|-------------|
|
|
| `check.health.api` | API Health | Fail | Verify API endpoints |
|
|
| `check.health.worker` | Worker Health | Fail | Verify background workers |
|
|
| `check.health.storage` | Storage Health | Fail | Verify storage backends |
|
|
|
|
## 4) Check Patterns
|
|
|
|
### Non-Destructive Probing
|
|
|
|
Registry checks use non-destructive operations:
|
|
|
|
```csharp
|
|
// Pull check: HEAD request only (no data transfer)
|
|
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, manifestUrl), ct);
|
|
|
|
// Push check: Start upload then immediately cancel
|
|
var uploadResponse = await client.PostAsync(uploadsUrl, null, ct);
|
|
if (uploadResponse.StatusCode == HttpStatusCode.Accepted)
|
|
{
|
|
var location = uploadResponse.Headers.Location;
|
|
await client.DeleteAsync(location, ct); // Cancel upload
|
|
}
|
|
```
|
|
|
|
### Capability Detection
|
|
|
|
Registry capability probing sequence:
|
|
|
|
```
|
|
1. GET /v2/ → Extract OCI-Distribution-API-Version header
|
|
2. GET /v2/{repo}/referrers/{digest} → Check referrers API support
|
|
3. POST /v2/{repo}/blobs/uploads/ → Check chunked upload support
|
|
└─ DELETE {location} → Cancel upload session
|
|
4. POST /v2/{repo}/blobs/uploads/?mount=...&from=... → Check cross-repo mount
|
|
5. OPTIONS /v2/{repo}/manifests/{ref} → Check delete support (Allow header)
|
|
6. OPTIONS /v2/{repo}/blobs/{digest} → Check blob delete support
|
|
```
|
|
|
|
### Evidence Collection
|
|
|
|
All checks collect structured evidence:
|
|
|
|
```csharp
|
|
var result = CheckResultBuilder.Create(check)
|
|
.Pass("Registry authentication successful")
|
|
.WithEvidence(eb => eb
|
|
.Add("registry_url", registryUrl)
|
|
.Add("auth_method", "bearer")
|
|
.Add("response_time_ms", elapsed.TotalMilliseconds.ToString("F0"))
|
|
.AddSensitive("token_preview", RedactToken(token)))
|
|
.Build();
|
|
```
|
|
|
|
### Credential Redaction
|
|
|
|
Sensitive values are automatically redacted:
|
|
|
|
```csharp
|
|
// Redact to first 2 + last 2 characters
|
|
private static string Redact(string? value)
|
|
{
|
|
if (string.IsNullOrEmpty(value) || value.Length <= 4)
|
|
return "****";
|
|
return $"{value[..2]}...{value[^2..]}";
|
|
}
|
|
// "mysecretpassword" → "my...rd"
|
|
```
|
|
|
|
## 5) CLI Integration
|
|
|
|
```bash
|
|
# Run all checks
|
|
stella doctor
|
|
|
|
# Run checks by tag
|
|
stella doctor --tag registry
|
|
stella doctor --tag configuration
|
|
|
|
# Run specific check
|
|
stella doctor --check check.integration.oci.referrers
|
|
|
|
# Output formats
|
|
stella doctor --format table # Default: human-readable
|
|
stella doctor --format json # Machine-readable
|
|
stella doctor --format sarif # SARIF for CI integration
|
|
|
|
# Verbosity
|
|
stella doctor --verbose # Include evidence details
|
|
stella doctor --quiet # Only show failures
|
|
|
|
# Filtering by severity
|
|
stella doctor --min-severity warn # Skip info/pass
|
|
```
|
|
|
|
## 6) Extensibility
|
|
|
|
### Creating a Custom Check
|
|
|
|
```csharp
|
|
public sealed class MyCustomCheck : IDoctorCheck
|
|
{
|
|
public string CheckId => "check.custom.mycheck";
|
|
public string Name => "My Custom Check";
|
|
public string Description => "Validates custom integration";
|
|
public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
|
|
public IReadOnlyList<string> Tags => ["custom", "integration"];
|
|
public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(5);
|
|
|
|
public bool CanRun(DoctorPluginContext context)
|
|
{
|
|
// Return false if preconditions not met
|
|
return context.Configuration["Custom:Enabled"] == "true";
|
|
}
|
|
|
|
public async Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct)
|
|
{
|
|
var builder = CheckResultBuilder.Create(this);
|
|
|
|
try
|
|
{
|
|
// Perform check logic
|
|
var result = await ValidateAsync(context, ct);
|
|
|
|
if (result.Success)
|
|
{
|
|
return builder
|
|
.Pass("Custom validation successful")
|
|
.WithEvidence(eb => eb.Add("detail", result.Detail))
|
|
.Build();
|
|
}
|
|
|
|
return builder
|
|
.Fail("Custom validation failed")
|
|
.WithLikelyCause("Configuration is invalid")
|
|
.WithRemediation(rb => rb
|
|
.AddManualStep(1, "Check configuration", "Verify Custom:Setting is correct")
|
|
.WithRunbookUrl("https://docs.stella-ops.org/runbooks/custom-check"))
|
|
.Build();
|
|
}
|
|
catch (Exception ex)
|
|
{
|
|
return builder
|
|
.Fail($"Check failed with error: {ex.Message}")
|
|
.WithEvidence(eb => eb.Add("exception_type", ex.GetType().Name))
|
|
.Build();
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Creating a Custom Plugin
|
|
|
|
```csharp
|
|
public sealed class MyCustomPlugin : IDoctorPlugin
|
|
{
|
|
public string PluginId => "custom";
|
|
public string DisplayName => "Custom Checks";
|
|
public string Category => "Integration";
|
|
public Version Version => new(1, 0, 0);
|
|
|
|
public IEnumerable<IDoctorCheck> GetChecks()
|
|
{
|
|
yield return new MyCustomCheck();
|
|
yield return new AnotherCustomCheck();
|
|
}
|
|
|
|
public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
|
|
{
|
|
// Optional initialization
|
|
return Task.CompletedTask;
|
|
}
|
|
}
|
|
```
|
|
|
|
## 7) Telemetry
|
|
|
|
Doctor emits metrics and traces for observability:
|
|
|
|
**Metrics:**
|
|
- `doctor_check_duration_seconds{check_id, severity}` - Check execution time
|
|
- `doctor_check_results_total{check_id, severity}` - Result counts
|
|
- `doctor_plugin_load_duration_seconds{plugin_id}` - Plugin initialization time
|
|
|
|
**Traces:**
|
|
- `doctor.run` - Full doctor run span
|
|
- `doctor.check.{check_id}` - Individual check spans with evidence as attributes
|
|
|
|
## 8) Related Documentation
|
|
|
|
- [Registry Diagnostic Checks](./registry-checks.md)
|
|
- [Registry Compatibility Runbook](../../runbooks/registry-compatibility.md)
|
|
- [Registry Referrer Troubleshooting](../../runbooks/registry-referrer-troubleshooting.md)
|