test fixes and new product advisories work
This commit is contained in:
308
docs/modules/doctor/architecture.md
Normal file
308
docs/modules/doctor/architecture.md
Normal file
@@ -0,0 +1,308 @@
|
||||
# Doctor Architecture
|
||||
|
||||
> Module: Doctor
|
||||
> Sprint: SPRINT_0127_001_0002_oci_registry_compatibility
|
||||
|
||||
Stella Doctor is a diagnostic framework for validating system health, configuration, and integration connectivity across the StellaOps platform.
|
||||
|
||||
## 1) Overview
|
||||
|
||||
Doctor provides a plugin-based diagnostic system that enables:
|
||||
- **Health checks** for all platform components
|
||||
- **Integration validation** for external systems (registries, SCM, CI, secrets)
|
||||
- **Configuration verification** before deployment
|
||||
- **Capability probing** for feature compatibility
|
||||
- **Evidence collection** for troubleshooting and compliance
|
||||
|
||||
## 2) Plugin Architecture
|
||||
|
||||
### Core Interfaces
|
||||
|
||||
```csharp
|
||||
public interface IDoctorPlugin
|
||||
{
|
||||
string PluginId { get; }
|
||||
string DisplayName { get; }
|
||||
string Category { get; }
|
||||
Version Version { get; }
|
||||
|
||||
IEnumerable<IDoctorCheck> GetChecks();
|
||||
Task InitializeAsync(DoctorPluginContext context, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface IDoctorCheck
|
||||
{
|
||||
string CheckId { get; }
|
||||
string Name { get; }
|
||||
string Description { get; }
|
||||
DoctorSeverity DefaultSeverity { get; }
|
||||
IReadOnlyList<string> Tags { get; }
|
||||
TimeSpan EstimatedDuration { get; }
|
||||
|
||||
bool CanRun(DoctorPluginContext context);
|
||||
Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### Plugin Context
|
||||
|
||||
```csharp
|
||||
public sealed class DoctorPluginContext
|
||||
{
|
||||
public IServiceProvider Services { get; }
|
||||
public IConfiguration Configuration { get; }
|
||||
public TimeProvider TimeProvider { get; }
|
||||
public ILogger Logger { get; }
|
||||
public string EnvironmentName { get; }
|
||||
public IReadOnlyDictionary<string, object> PluginConfig { get; }
|
||||
}
|
||||
```
|
||||
|
||||
### Check Results
|
||||
|
||||
```csharp
|
||||
public sealed record CheckResult
|
||||
{
|
||||
public DoctorSeverity Severity { get; init; }
|
||||
public string Diagnosis { get; init; }
|
||||
public Evidence Evidence { get; init; }
|
||||
public IReadOnlyList<string> LikelyCauses { get; init; }
|
||||
public Remediation? Remediation { get; init; }
|
||||
public string? VerificationCommand { get; init; }
|
||||
}
|
||||
|
||||
public enum DoctorSeverity
|
||||
{
|
||||
Pass, // Check succeeded
|
||||
Info, // Informational (no action needed)
|
||||
Warn, // Warning (degraded but functional)
|
||||
Fail, // Failure (requires action)
|
||||
Skip // Check skipped (preconditions not met)
|
||||
}
|
||||
```
|
||||
|
||||
## 3) Built-in Plugins
|
||||
|
||||
### IntegrationPlugin
|
||||
|
||||
Validates external system connectivity and capabilities.
|
||||
|
||||
**Check Catalog:**
|
||||
|
||||
| Check ID | Name | Severity | Description |
|
||||
|----------|------|----------|-------------|
|
||||
| `check.integration.oci.credentials` | OCI Registry Credentials | Fail | Validate registry authentication |
|
||||
| `check.integration.oci.pull` | OCI Registry Pull Authorization | Fail | Verify pull permissions |
|
||||
| `check.integration.oci.push` | OCI Registry Push Authorization | Fail | Verify push permissions |
|
||||
| `check.integration.oci.referrers` | OCI Registry Referrers API | Warn | Check OCI 1.1 referrers support |
|
||||
| `check.integration.oci.capabilities` | OCI Registry Capability Matrix | Info | Probe all registry capabilities |
|
||||
|
||||
See [Registry Diagnostic Checks](./registry-checks.md) for detailed documentation.
|
||||
|
||||
### ConfigurationPlugin
|
||||
|
||||
Validates platform configuration.
|
||||
|
||||
| Check ID | Name | Severity | Description |
|
||||
|----------|------|----------|-------------|
|
||||
| `check.config.database` | Database Connection | Fail | Verify database connectivity |
|
||||
| `check.config.secrets` | Secrets Provider | Fail | Verify secrets access |
|
||||
| `check.config.tls` | TLS Configuration | Warn | Validate TLS certificates |
|
||||
|
||||
### HealthPlugin
|
||||
|
||||
Validates platform component health.
|
||||
|
||||
| Check ID | Name | Severity | Description |
|
||||
|----------|------|----------|-------------|
|
||||
| `check.health.api` | API Health | Fail | Verify API endpoints |
|
||||
| `check.health.worker` | Worker Health | Fail | Verify background workers |
|
||||
| `check.health.storage` | Storage Health | Fail | Verify storage backends |
|
||||
|
||||
## 4) Check Patterns
|
||||
|
||||
### Non-Destructive Probing
|
||||
|
||||
Registry checks use non-destructive operations:
|
||||
|
||||
```csharp
|
||||
// Pull check: HEAD request only (no data transfer)
|
||||
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, manifestUrl), ct);
|
||||
|
||||
// Push check: Start upload then immediately cancel
|
||||
var uploadResponse = await client.PostAsync(uploadsUrl, null, ct);
|
||||
if (uploadResponse.StatusCode == HttpStatusCode.Accepted)
|
||||
{
|
||||
var location = uploadResponse.Headers.Location;
|
||||
await client.DeleteAsync(location, ct); // Cancel upload
|
||||
}
|
||||
```
|
||||
|
||||
### Capability Detection
|
||||
|
||||
Registry capability probing sequence:
|
||||
|
||||
```
|
||||
1. GET /v2/ → Extract OCI-Distribution-API-Version header
|
||||
2. GET /v2/{repo}/referrers/{digest} → Check referrers API support
|
||||
3. POST /v2/{repo}/blobs/uploads/ → Check chunked upload support
|
||||
└─ DELETE {location} → Cancel upload session
|
||||
4. POST /v2/{repo}/blobs/uploads/?mount=...&from=... → Check cross-repo mount
|
||||
5. OPTIONS /v2/{repo}/manifests/{ref} → Check delete support (Allow header)
|
||||
6. OPTIONS /v2/{repo}/blobs/{digest} → Check blob delete support
|
||||
```
|
||||
|
||||
### Evidence Collection
|
||||
|
||||
All checks collect structured evidence:
|
||||
|
||||
```csharp
|
||||
var result = CheckResultBuilder.Create(check)
|
||||
.Pass("Registry authentication successful")
|
||||
.WithEvidence(eb => eb
|
||||
.Add("registry_url", registryUrl)
|
||||
.Add("auth_method", "bearer")
|
||||
.Add("response_time_ms", elapsed.TotalMilliseconds.ToString("F0"))
|
||||
.AddSensitive("token_preview", RedactToken(token)))
|
||||
.Build();
|
||||
```
|
||||
|
||||
### Credential Redaction
|
||||
|
||||
Sensitive values are automatically redacted:
|
||||
|
||||
```csharp
|
||||
// Redact to first 2 + last 2 characters
|
||||
private static string Redact(string? value)
|
||||
{
|
||||
if (string.IsNullOrEmpty(value) || value.Length <= 4)
|
||||
return "****";
|
||||
return $"{value[..2]}...{value[^2..]}";
|
||||
}
|
||||
// "mysecretpassword" → "my...rd"
|
||||
```
|
||||
|
||||
## 5) CLI Integration
|
||||
|
||||
```bash
|
||||
# Run all checks
|
||||
stella doctor
|
||||
|
||||
# Run checks by tag
|
||||
stella doctor --tag registry
|
||||
stella doctor --tag configuration
|
||||
|
||||
# Run specific check
|
||||
stella doctor --check check.integration.oci.referrers
|
||||
|
||||
# Output formats
|
||||
stella doctor --format table # Default: human-readable
|
||||
stella doctor --format json # Machine-readable
|
||||
stella doctor --format sarif # SARIF for CI integration
|
||||
|
||||
# Verbosity
|
||||
stella doctor --verbose # Include evidence details
|
||||
stella doctor --quiet # Only show failures
|
||||
|
||||
# Filtering by severity
|
||||
stella doctor --min-severity warn # Skip info/pass
|
||||
```
|
||||
|
||||
## 6) Extensibility
|
||||
|
||||
### Creating a Custom Check
|
||||
|
||||
```csharp
|
||||
public sealed class MyCustomCheck : IDoctorCheck
|
||||
{
|
||||
public string CheckId => "check.custom.mycheck";
|
||||
public string Name => "My Custom Check";
|
||||
public string Description => "Validates custom integration";
|
||||
public DoctorSeverity DefaultSeverity => DoctorSeverity.Fail;
|
||||
public IReadOnlyList<string> Tags => ["custom", "integration"];
|
||||
public TimeSpan EstimatedDuration => TimeSpan.FromSeconds(5);
|
||||
|
||||
public bool CanRun(DoctorPluginContext context)
|
||||
{
|
||||
// Return false if preconditions not met
|
||||
return context.Configuration["Custom:Enabled"] == "true";
|
||||
}
|
||||
|
||||
public async Task<CheckResult> RunAsync(DoctorPluginContext context, CancellationToken ct)
|
||||
{
|
||||
var builder = CheckResultBuilder.Create(this);
|
||||
|
||||
try
|
||||
{
|
||||
// Perform check logic
|
||||
var result = await ValidateAsync(context, ct);
|
||||
|
||||
if (result.Success)
|
||||
{
|
||||
return builder
|
||||
.Pass("Custom validation successful")
|
||||
.WithEvidence(eb => eb.Add("detail", result.Detail))
|
||||
.Build();
|
||||
}
|
||||
|
||||
return builder
|
||||
.Fail("Custom validation failed")
|
||||
.WithLikelyCause("Configuration is invalid")
|
||||
.WithRemediation(rb => rb
|
||||
.AddManualStep(1, "Check configuration", "Verify Custom:Setting is correct")
|
||||
.WithRunbookUrl("https://docs.stella-ops.org/runbooks/custom-check"))
|
||||
.Build();
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
return builder
|
||||
.Fail($"Check failed with error: {ex.Message}")
|
||||
.WithEvidence(eb => eb.Add("exception_type", ex.GetType().Name))
|
||||
.Build();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Creating a Custom Plugin
|
||||
|
||||
```csharp
|
||||
public sealed class MyCustomPlugin : IDoctorPlugin
|
||||
{
|
||||
public string PluginId => "custom";
|
||||
public string DisplayName => "Custom Checks";
|
||||
public string Category => "Integration";
|
||||
public Version Version => new(1, 0, 0);
|
||||
|
||||
public IEnumerable<IDoctorCheck> GetChecks()
|
||||
{
|
||||
yield return new MyCustomCheck();
|
||||
yield return new AnotherCustomCheck();
|
||||
}
|
||||
|
||||
public Task InitializeAsync(DoctorPluginContext context, CancellationToken ct)
|
||||
{
|
||||
// Optional initialization
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 7) Telemetry
|
||||
|
||||
Doctor emits metrics and traces for observability:
|
||||
|
||||
**Metrics:**
|
||||
- `doctor_check_duration_seconds{check_id, severity}` - Check execution time
|
||||
- `doctor_check_results_total{check_id, severity}` - Result counts
|
||||
- `doctor_plugin_load_duration_seconds{plugin_id}` - Plugin initialization time
|
||||
|
||||
**Traces:**
|
||||
- `doctor.run` - Full doctor run span
|
||||
- `doctor.check.{check_id}` - Individual check spans with evidence as attributes
|
||||
|
||||
## 8) Related Documentation
|
||||
|
||||
- [Registry Diagnostic Checks](./registry-checks.md)
|
||||
- [Registry Compatibility Runbook](../../runbooks/registry-compatibility.md)
|
||||
- [Registry Referrer Troubleshooting](../../runbooks/registry-referrer-troubleshooting.md)
|
||||
Reference in New Issue
Block a user