Files
git.stella-ops.org/docs/modules/doctor/architecture.md
2026-01-28 02:30:48 +02:00

309 lines
9.3 KiB
Markdown

# Doctor Architecture
> Module: Doctor
> Sprint: SPRINT_0127_001_0002_oci_registry_compatibility
Stella Doctor is a diagnostic framework for validating system health, configuration, and integration connectivity across the StellaOps platform.
## 1) Overview
Doctor provides a plugin-based diagnostic system that enables:
- **Health checks** for all platform components
- **Integration validation** for external systems (registries, SCM, CI, secrets)
- **Configuration verification** before deployment
- **Capability probing** for feature compatibility
- **Evidence collection** for troubleshooting and compliance
## 2) Plugin Architecture
### Core Interfaces
```csharp
public interface IDoctorPlugin
{
string PluginId { get; }
string DisplayName { get; }
string Category { get; }
Version Version { get; }
IEnumerable<IDoctorCheck> GetChecks();
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
}
public interface IDoctorCheck
{
string CheckId { get; }
string Name { get; }
string Description { get; }
DoctorSeverity DefaultSeverity { get; }
IReadOnlyList<string> Tags { get; }
TimeSpan EstimatedDuration { get; }
bool CanRun(DoctorPluginContext context);
Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
}
```
### Plugin Context
```csharp
public sealed class DoctorPluginContext
{
public IServiceProvider Services { get; }
public IConfiguration Configuration { get; }
public TimeProvider TimeProvider { get; }
public ILogger Logger { get; }
public string EnvironmentName { get; }
public IReadOnlyDictionary<string, object> PluginConfig { get; }
}
```
### Check Results
```csharp
public sealed record CheckResult
{
public DoctorSeverity Severity { get; init; }
public string Diagnosis { get; init; }
public Evidence Evidence { get; init; }
public IReadOnlyList<string> LikelyCauses { get; init; }
public Remediation? Remediation { get; init; }
public string? VerificationCommand { get; init; }
}
public enum DoctorSeverity
{
Pass, // Check succeeded
Info, // Informational (no action needed)
Warn, // Warning (degraded but functional)
Fail, // Failure (requires action)
Skip // Check skipped (preconditions not met)
}
```
## 3) Built-in Plugins
### IntegrationPlugin
Validates external system connectivity and capabilities.
**Check Catalog:**
| Check ID | Name | Severity | Description |
|----------|------|----------|-------------|
| `check.integration.oci.credentials` | OCI Registry Credentials | Fail | Validate registry authentication |
| `check.integration.oci.pull` | OCI Registry Pull Authorization | Fail | Verify pull permissions |
| `check.integration.oci.push` | OCI Registry Push Authorization | Fail | Verify push permissions |
| `check.integration.oci.referrers` | OCI Registry Referrers API | Warn | Check OCI 1.1 referrers support |
| `check.integration.oci.capabilities` | OCI Registry Capability Matrix | Info | Probe all registry capabilities |
See [Registry Diagnostic Checks](./registry-checks.md) for detailed documentation.
### ConfigurationPlugin
Validates platform configuration.
| Check ID | Name | Severity | Description |
|----------|------|----------|-------------|
| `check.config.database` | Database Connection | Fail | Verify database connectivity |
| `check.config.secrets` | Secrets Provider | Fail | Verify secrets access |
| `check.config.tls` | TLS Configuration | Warn | Validate TLS certificates |
### HealthPlugin
Validates platform component health.
| Check ID | Name | Severity | Description |
|----------|------|----------|-------------|
| `check.health.api` | API Health | Fail | Verify API endpoints |
| `check.health.worker` | Worker Health | Fail | Verify background workers |
| `check.health.storage` | Storage Health | Fail | Verify storage backends |
## 4) Check Patterns
### Non-Destructive Probing
Registry checks use non-destructive operations:
```csharp
// Pull check: HEAD request only (no data transfer)
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, manifestUrl), ct);
// Push check: Start upload then immediately cancel
var uploadResponse = await client.PostAsync(uploadsUrl, null, ct);
if (uploadResponse.StatusCode == HttpStatusCode.Accepted)
{
var location = uploadResponse.Headers.Location;
await client.DeleteAsync(location, ct); // Cancel upload
}
```
### Capability Detection
Registry capability probing sequence:
```
1. GET /v2/ → Extract OCI-Distribution-API-Version header
2. GET /v2/{repo}/referrers/{digest} → Check referrers API support
3. POST /v2/{repo}/blobs/uploads/ → Check chunked upload support
└─ DELETE {location} → Cancel upload session
4. POST /v2/{repo}/blobs/uploads/?mount=...&from=... → Check cross-repo mount
5. OPTIONS /v2/{repo}/manifests/{ref} → Check delete support (Allow header)
6. OPTIONS /v2/{repo}/blobs/{digest} → Check blob delete support
```
### Evidence Collection
All checks collect structured evidence:
```csharp
var result = CheckResultBuilder.Create(check)
.Pass("Registry authentication successful")
.WithEvidence(eb => eb
.Add("registry_url", registryUrl)
.Add("auth_method", "bearer")
.Add("response_time_ms", elapsed.TotalMilliseconds.ToString("F0"))
.AddSensitive("token_preview", RedactToken(token)))
.Build();
```
### Credential Redaction
Sensitive values are automatically redacted:
```csharp
// Redact to first 2 + last 2 characters
private static string Redact(string? value)
{
if (string.IsNullOrEmpty(value) || value.Length <= 4)
return "****";
return $"{value[..2]}...{value[^2..]}";
}
// "mysecretpassword" → "my...rd"
```
## 5) CLI Integration
```bash
# Run all checks
stella doctor
# Run checks by tag
stella doctor --tag registry
stella doctor --tag configuration
# Run specific check
stella doctor --check check.integration.oci.referrers
# Output formats
stella doctor --format table # Default: human-readable
stella doctor --format json # Machine-readable
stella doctor --format sarif # SARIF for CI integration
# Verbosity
stella doctor --verbose # Include evidence details
stella doctor --quiet # Only show failures
# Filtering by severity
stella doctor --min-severity warn # Skip info/pass
```
## 6) Extensibility
### Creating a Custom Check
```csharp
public sealed class MyCustomCheck : IDoctorCheck
{
public string CheckId => "check.custom.mycheck";
public string Name => "My Custom Check";
public string Description => "Validates custom integration";
public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
public IReadOnlyList<string> Tags => ["custom", "integration"];
public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(5);
public bool CanRun(DoctorPluginContext context)
{
// Return false if preconditions not met
return context.Configuration["Custom:Enabled"] == "true";
}
public async Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct)
{
var builder = CheckResultBuilder.Create(this);
try
{
// Perform check logic
var result = await ValidateAsync(context, ct);
if (result.Success)
{
return builder
.Pass("Custom validation successful")
.WithEvidence(eb => eb.Add("detail", result.Detail))
.Build();
}
return builder
.Fail("Custom validation failed")
.WithLikelyCause("Configuration is invalid")
.WithRemediation(rb => rb
.AddManualStep(1, "Check configuration", "Verify Custom:Setting is correct")
.WithRunbookUrl("https://docs.stella-ops.org/runbooks/custom-check"))
.Build();
}
catch (Exception ex)
{
return builder
.Fail($"Check failed with error: {ex.Message}")
.WithEvidence(eb => eb.Add("exception_type", ex.GetType().Name))
.Build();
}
}
}
```
### Creating a Custom Plugin
```csharp
public sealed class MyCustomPlugin : IDoctorPlugin
{
public string PluginId => "custom";
public string DisplayName => "Custom Checks";
public string Category => "Integration";
public Version Version => new(1, 0, 0);
public IEnumerable<IDoctorCheck> GetChecks()
{
yield return new MyCustomCheck();
yield return new AnotherCustomCheck();
}
public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
{
// Optional initialization
return Task.CompletedTask;
}
}
```
## 7) Telemetry
Doctor emits metrics and traces for observability:
**Metrics:**
- `doctor_check_duration_seconds{check_id, severity}` - Check execution time
- `doctor_check_results_total{check_id, severity}` - Result counts
- `doctor_plugin_load_duration_seconds{plugin_id}` - Plugin initialization time
**Traces:**
- `doctor.run` - Full doctor run span
- `doctor.check.{check_id}` - Individual check spans with evidence as attributes
## 8) Related Documentation
- [Registry Diagnostic Checks](./registry-checks.md)
- [Registry Compatibility Runbook](../../runbooks/registry-compatibility.md)
- [Registry Referrer Troubleshooting](../../runbooks/registry-referrer-troubleshooting.md)