609 lines
17 KiB
Markdown
609 lines
17 KiB
Markdown
# Setup Wizard - Doctor Integration Contract
|
|
|
|
This document defines how the Setup Wizard integrates with the Doctor diagnostic system to validate each step and provide actionable remediation guidance.
|
|
|
|
## 1. Overview
|
|
|
|
The Setup Wizard relies on Doctor checks to:
|
|
1. **Validate** each configuration step
|
|
2. **Detect** existing configuration (for resume/reconfigure)
|
|
3. **Generate** runtime-specific fix commands
|
|
4. **Verify** that fixes were applied correctly
|
|
|
|
---
|
|
|
|
## 2. Step-to-Check Mapping
|
|
|
|
### 2.1 Required Steps
|
|
|
|
| Step ID | Doctor Check ID | Severity | Blocks Progression |
|
|
|---------|-----------------|----------|-------------------|
|
|
| `database` | `check.database.connectivity` | Critical | Yes |
|
|
| `database` | `check.database.permissions` | Critical | Yes |
|
|
| `database` | `check.database.version` | Warning | No |
|
|
| `valkey` | `check.services.valkey.connectivity` | Critical | Yes |
|
|
| `valkey` | `check.services.valkey.ping` | Critical | Yes |
|
|
| `migrations` | `check.database.migrations.applied` | Critical | Yes |
|
|
| `migrations` | `check.database.migrations.checksums` | Critical | Yes |
|
|
| `migrations` | `check.database.schema.version` | Info | No |
|
|
| `admin` | `check.auth.admin.exists` | Critical | Yes |
|
|
| `admin` | `check.auth.password.policy` | Warning | No |
|
|
| `crypto` | `check.crypto.profile.valid` | Critical | Yes |
|
|
| `crypto` | `check.crypto.signing.test` | Warning | No |
|
|
|
|
### 2.2 Optional Steps
|
|
|
|
| Step ID | Doctor Check ID | Severity | Blocks Progression |
|
|
|---------|-----------------|----------|-------------------|
|
|
| `vault` | `check.integration.vault.connected` | Warning | No |
|
|
| `vault` | `check.integration.vault.auth` | Warning | No |
|
|
| `vault` | `check.integration.vault.secrets.access` | Info | No |
|
|
| `scm` | `check.integration.scm.github.auth` | Info | No |
|
|
| `scm` | `check.integration.scm.github.permissions` | Info | No |
|
|
| `scm` | `check.integration.scm.gitlab.auth` | Info | No |
|
|
| `registry` | `check.integration.registry.connected` | Info | No |
|
|
| `notifications` | `check.notify.channel.configured` | Info | No |
|
|
| `notifications` | `check.notify.slack.webhook` | Info | No |
|
|
| `notifications` | `check.notify.email.smtp` | Info | No |
|
|
| `identity` | `check.security.identity.configured` | Info | No |
|
|
| `identity` | `check.security.oidc.provider` | Info | No |
|
|
| `environments` | `check.orchestrator.environment.exists` | Info | No |
|
|
| `environments` | `check.orchestrator.environment.valid` | Info | No |
|
|
| `agents` | `check.orchestrator.agent.registered` | Info | No |
|
|
| `agents` | `check.orchestrator.agent.healthy` | Info | No |
|
|
| `feeds` | `check.feeds.sync.enabled` | Info | No |
|
|
|
|
---
|
|
|
|
## 3. Check Output Model
|
|
|
|
### 3.1 CheckResult Schema
|
|
|
|
```csharp
|
|
public sealed record CheckResult
|
|
{
|
|
public required string CheckId { get; init; }
|
|
public required CheckStatus Status { get; init; } // Pass, Warn, Fail
|
|
public required string Message { get; init; }
|
|
public required TimeSpan Duration { get; init; }
|
|
public ImmutableDictionary<string, object> Evidence { get; init; }
|
|
public ImmutableArray<LikelyCause> LikelyCauses { get; init; }
|
|
public ImmutableArray<RemediationCommand> Remediations { get; init; }
|
|
public string? VerificationCommand { get; init; }
|
|
}
|
|
|
|
public enum CheckStatus { Pass, Warn, Fail }
|
|
|
|
public sealed record LikelyCause
|
|
{
|
|
public required int Priority { get; init; } // 1 = most likely
|
|
public required string Description { get; init; }
|
|
public string? DocumentationUrl { get; init; }
|
|
}
|
|
|
|
public sealed record RemediationCommand
|
|
{
|
|
public required RuntimeEnvironment Runtime { get; init; }
|
|
public required string Command { get; init; }
|
|
public required string Description { get; init; }
|
|
public bool RequiresSudo { get; init; }
|
|
public bool IsDangerous { get; init; } // Requires confirmation
|
|
public ImmutableDictionary<string, string> Placeholders { get; init; }
|
|
}
|
|
|
|
public enum RuntimeEnvironment
|
|
{
|
|
DockerCompose,
|
|
Kubernetes,
|
|
Systemd,
|
|
WindowsService,
|
|
Bare,
|
|
Any
|
|
}
|
|
```
|
|
|
|
### 3.2 Evidence Dictionary
|
|
|
|
The `Evidence` dictionary contains check-specific data:
|
|
|
|
| Check Category | Evidence Keys |
|
|
|----------------|---------------|
|
|
| **Database** | `host`, `port`, `database`, `version`, `user`, `sslMode` |
|
|
| **Valkey** | `host`, `port`, `version`, `usedMemory`, `maxMemory` |
|
|
| **Migrations** | `pendingCount`, `appliedCount`, `lastMigration`, `failedMigrations` |
|
|
| **Auth** | `adminCount`, `adminUsername`, `passwordLastChanged` |
|
|
| **Vault** | `provider`, `version`, `mountPoints`, `authMethod` |
|
|
| **SCM** | `provider`, `rateLimit`, `remainingCalls`, `organization` |
|
|
|
|
---
|
|
|
|
## 4. Remediation Command Generation
|
|
|
|
### 4.1 Runtime Detection
|
|
|
|
The wizard detects the runtime environment via:
|
|
|
|
```csharp
|
|
public interface IRuntimeDetector
|
|
{
|
|
RuntimeEnvironment Detect();
|
|
bool IsDockerAvailable();
|
|
bool IsKubernetesContext();
|
|
bool IsSystemdManaged(string serviceName);
|
|
string GetComposeProjectPath();
|
|
string GetKubernetesNamespace();
|
|
}
|
|
```
|
|
|
|
Detection logic:
|
|
1. Check for `/.dockerenv` file → Docker container
|
|
2. Check for `KUBERNETES_SERVICE_HOST` → Kubernetes
|
|
3. Check for `docker compose` command → Docker Compose
|
|
4. Check for `systemctl` command → systemd
|
|
5. Check for Windows services → Windows Service
|
|
6. Default → Bare (manual)
|
|
|
|
### 4.2 Command Templates
|
|
|
|
#### Database Connection Failure
|
|
|
|
```yaml
|
|
check.database.connectivity:
|
|
likelyCauses:
|
|
- priority: 1
|
|
description: "PostgreSQL is not running"
|
|
- priority: 2
|
|
description: "Firewall blocking port 5432"
|
|
- priority: 3
|
|
description: "Incorrect host or port"
|
|
- priority: 4
|
|
description: "Network connectivity issue"
|
|
|
|
remediations:
|
|
- runtime: DockerCompose
|
|
description: "Start PostgreSQL container"
|
|
command: "docker compose -f {{COMPOSE_FILE}} up -d postgres"
|
|
placeholders:
|
|
COMPOSE_FILE: "devops/compose/docker-compose.yml"
|
|
|
|
- runtime: Kubernetes
|
|
description: "Check PostgreSQL pod status"
|
|
command: "kubectl get pods -n {{NAMESPACE}} -l app=postgres"
|
|
placeholders:
|
|
NAMESPACE: "stellaops"
|
|
|
|
- runtime: Systemd
|
|
description: "Start PostgreSQL service"
|
|
command: "sudo systemctl start postgresql"
|
|
requiresSudo: true
|
|
|
|
- runtime: Any
|
|
description: "Verify PostgreSQL is listening"
|
|
command: "pg_isready -h {{HOST}} -p {{PORT}}"
|
|
placeholders:
|
|
HOST: "localhost"
|
|
PORT: "5432"
|
|
|
|
verificationCommand: "pg_isready -h {{HOST}} -p {{PORT}}"
|
|
```
|
|
|
|
#### Valkey Connection Failure
|
|
|
|
```yaml
|
|
check.services.valkey.connectivity:
|
|
likelyCauses:
|
|
- priority: 1
|
|
description: "Valkey/Redis is not running"
|
|
- priority: 2
|
|
description: "Firewall blocking port 6379"
|
|
- priority: 3
|
|
description: "Authentication required but not configured"
|
|
|
|
remediations:
|
|
- runtime: DockerCompose
|
|
description: "Start Valkey container"
|
|
command: "docker compose -f {{COMPOSE_FILE}} up -d valkey"
|
|
placeholders:
|
|
COMPOSE_FILE: "devops/compose/docker-compose.yml"
|
|
|
|
- runtime: Kubernetes
|
|
description: "Check Valkey pod status"
|
|
command: "kubectl get pods -n {{NAMESPACE}} -l app=valkey"
|
|
placeholders:
|
|
NAMESPACE: "stellaops"
|
|
|
|
- runtime: Systemd
|
|
description: "Start Valkey service"
|
|
command: "sudo systemctl start valkey"
|
|
requiresSudo: true
|
|
|
|
- runtime: Any
|
|
description: "Test Valkey connection"
|
|
command: "redis-cli -h {{HOST}} -p {{PORT}} PING"
|
|
placeholders:
|
|
HOST: "localhost"
|
|
PORT: "6379"
|
|
|
|
verificationCommand: "redis-cli -h {{HOST}} -p {{PORT}} PING"
|
|
```
|
|
|
|
#### Pending Migrations
|
|
|
|
```yaml
|
|
check.database.migrations.applied:
|
|
likelyCauses:
|
|
- priority: 1
|
|
description: "Pending release migrations require manual execution"
|
|
- priority: 2
|
|
description: "Startup migrations not yet applied"
|
|
|
|
remediations:
|
|
- runtime: Any
|
|
description: "Run pending migrations (dry-run first)"
|
|
command: "stella migrations-run --module all --dry-run"
|
|
|
|
- runtime: Any
|
|
description: "Apply all pending migrations"
|
|
command: "stella migrations-run --module all"
|
|
isDangerous: true
|
|
|
|
- runtime: DockerCompose
|
|
description: "Run migrations in container"
|
|
command: "docker compose exec api stella migrations-run --module all"
|
|
|
|
- runtime: Kubernetes
|
|
description: "Run migrations job"
|
|
command: "kubectl apply -f devops/k8s/jobs/migrations.yaml"
|
|
|
|
verificationCommand: "stella migrations-run --module all --dry-run"
|
|
```
|
|
|
|
#### Vault Authentication Failure
|
|
|
|
```yaml
|
|
check.integration.vault.auth:
|
|
likelyCauses:
|
|
- priority: 1
|
|
description: "Vault token expired or revoked"
|
|
- priority: 2
|
|
description: "AppRole credentials invalid"
|
|
- priority: 3
|
|
description: "Kubernetes service account not configured"
|
|
- priority: 4
|
|
description: "Vault server unreachable"
|
|
|
|
remediations:
|
|
- runtime: Any
|
|
description: "Test Vault connectivity"
|
|
command: "curl -s {{VAULT_ADDR}}/v1/sys/health"
|
|
placeholders:
|
|
VAULT_ADDR: "https://vault.example.com:8200"
|
|
|
|
- runtime: Any
|
|
description: "Verify token validity"
|
|
command: "vault token lookup"
|
|
|
|
- runtime: Kubernetes
|
|
description: "Check Kubernetes auth configuration"
|
|
command: "kubectl get serviceaccount -n {{NAMESPACE}} stellaops-vault-auth"
|
|
placeholders:
|
|
NAMESPACE: "stellaops"
|
|
|
|
verificationCommand: "vault token lookup"
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Placeholder Resolution
|
|
|
|
### 5.1 Placeholder Sources
|
|
|
|
Placeholders in commands are resolved from:
|
|
|
|
| Source | Priority | Example |
|
|
|--------|----------|---------|
|
|
| User input | 1 (highest) | `{{HOST}}` from form field |
|
|
| Environment | 2 | `{{VAULT_ADDR}}` from env |
|
|
| Detection | 3 | `{{NAMESPACE}}` from context |
|
|
| Default | 4 (lowest) | Fallback value |
|
|
|
|
### 5.2 Placeholder Syntax
|
|
|
|
```
|
|
{{PLACEHOLDER_NAME}}
|
|
{{PLACEHOLDER_NAME:-default_value}}
|
|
```
|
|
|
|
Examples:
|
|
- `{{HOST}}` - Required placeholder
|
|
- `{{PORT:-5432}}` - Optional with default
|
|
- `{{COMPOSE_FILE:-docker-compose.yml}}` - File path default
|
|
|
|
### 5.3 Secret Redaction
|
|
|
|
Commands containing secrets are never displayed with actual values:
|
|
|
|
| Placeholder | Display | Actual |
|
|
|-------------|---------|--------|
|
|
| `{{PASSWORD}}` | `{{PASSWORD}}` | Never resolved in display |
|
|
| `{{TOKEN}}` | `{{TOKEN}}` | Never resolved in display |
|
|
| `{{SECRET_KEY}}` | `{{SECRET_KEY}}` | Never resolved in display |
|
|
|
|
The user must copy and manually substitute secrets.
|
|
|
|
---
|
|
|
|
## 6. Verification Flow
|
|
|
|
### 6.1 Post-Fix Verification
|
|
|
|
After the user applies a fix, the wizard:
|
|
|
|
1. **Wait** - Pause for user confirmation ("I've run this command")
|
|
2. **Verify** - Run the verification command
|
|
3. **Re-check** - Run the original Doctor check
|
|
4. **Report** - Show success or next steps
|
|
|
|
### 6.2 Verification Command Execution
|
|
|
|
```csharp
|
|
public interface IVerificationExecutor
|
|
{
|
|
Task<VerificationResult> ExecuteAsync(
|
|
string command,
|
|
TimeSpan timeout,
|
|
CancellationToken ct);
|
|
}
|
|
|
|
public sealed record VerificationResult
|
|
{
|
|
public required bool Success { get; init; }
|
|
public required int ExitCode { get; init; }
|
|
public required string Output { get; init; }
|
|
public required TimeSpan Duration { get; init; }
|
|
}
|
|
```
|
|
|
|
### 6.3 Re-Check Behavior
|
|
|
|
```
|
|
[FAIL] check.database.connectivity
|
|
|
|
Suggested fix applied. Verifying...
|
|
|
|
[RUN] pg_isready -h localhost -p 5432
|
|
localhost:5432 - accepting connections
|
|
|
|
Re-running check...
|
|
|
|
[PASS] check.database.connectivity
|
|
PostgreSQL connection successful
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Check Aggregation
|
|
|
|
### 7.1 Step Completion Criteria
|
|
|
|
A step is complete when:
|
|
- All **Critical** checks pass
|
|
- No **Fail** status on any check
|
|
- User has acknowledged all **Warning** checks
|
|
|
|
### 7.2 Aggregated Status
|
|
|
|
```csharp
|
|
public enum StepValidationStatus
|
|
{
|
|
NotStarted, // No checks run
|
|
InProgress, // Checks running
|
|
Passed, // All critical pass, no failures
|
|
PassedWithWarns, // All critical pass, some warnings
|
|
Failed, // Any critical failure
|
|
Skipped // User explicitly skipped
|
|
}
|
|
```
|
|
|
|
### 7.3 Status Rollup for Thresholds
|
|
|
|
```
|
|
Operational Threshold:
|
|
[x] check.database.connectivity PASS
|
|
[x] check.database.permissions PASS
|
|
[x] check.database.migrations.applied PASS
|
|
[x] check.services.valkey.connectivity PASS
|
|
[x] check.auth.admin.exists PASS
|
|
[x] check.crypto.profile.valid PASS
|
|
|
|
Status: OPERATIONAL (6/6 required checks passed)
|
|
|
|
Production-Ready Threshold:
|
|
[x] check.security.identity.configured PASS
|
|
[x] check.integration.vault.connected PASS
|
|
[x] check.integration.scm.connected PASS
|
|
[x] check.notify.channel.configured PASS
|
|
[ ] check.orchestrator.agent.healthy SKIP
|
|
[ ] check.feeds.sync.enabled SKIP
|
|
|
|
Status: NOT PRODUCTION-READY (4/6 recommended, 2 skipped)
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Doctor Engine Integration
|
|
|
|
### 8.1 Wizard-Specific Check Context
|
|
|
|
The wizard provides context to Doctor checks:
|
|
|
|
```csharp
|
|
public sealed record WizardCheckContext
|
|
{
|
|
public required string StepId { get; init; }
|
|
public required RuntimeEnvironment DetectedRuntime { get; init; }
|
|
public required ImmutableDictionary<string, string> UserInputs { get; init; }
|
|
public bool GenerateRemediations { get; init; } = true;
|
|
public bool IncludePlaceholders { get; init; } = true;
|
|
}
|
|
```
|
|
|
|
### 8.2 Check Invocation
|
|
|
|
```csharp
|
|
public interface IWizardDoctorClient
|
|
{
|
|
Task<ImmutableArray<CheckResult>> RunStepChecksAsync(
|
|
string stepId,
|
|
WizardCheckContext context,
|
|
CancellationToken ct);
|
|
|
|
Task<CheckResult> RunSingleCheckAsync(
|
|
string checkId,
|
|
WizardCheckContext context,
|
|
CancellationToken ct);
|
|
|
|
Task<VerificationResult> RunVerificationAsync(
|
|
string command,
|
|
WizardCheckContext context,
|
|
CancellationToken ct);
|
|
}
|
|
```
|
|
|
|
### 8.3 Check Timeout
|
|
|
|
| Check Category | Default Timeout | Max Timeout |
|
|
|----------------|-----------------|-------------|
|
|
| Connectivity | 10 seconds | 30 seconds |
|
|
| Authentication | 15 seconds | 60 seconds |
|
|
| Migrations | 60 seconds | 300 seconds |
|
|
| Full validation | 30 seconds | 120 seconds |
|
|
|
|
---
|
|
|
|
## 9. Remediation Safety
|
|
|
|
### 9.1 Dangerous Commands
|
|
|
|
Commands marked `isDangerous: true` require user confirmation:
|
|
|
|
```
|
|
WARNING: This command will modify your database schema.
|
|
|
|
Command:
|
|
stella migrations-run --module all
|
|
|
|
This action:
|
|
- Applies 5 pending migrations
|
|
- Cannot be automatically rolled back
|
|
- May take several minutes
|
|
|
|
Type 'apply' to confirm: _
|
|
```
|
|
|
|
### 9.2 Sudo Requirements
|
|
|
|
Commands requiring `sudo` show a notice:
|
|
|
|
```
|
|
This command requires administrator privileges.
|
|
|
|
Command:
|
|
sudo systemctl start postgresql
|
|
|
|
[Copy Command]
|
|
|
|
Note: You may be prompted for your password.
|
|
```
|
|
|
|
### 9.3 Secret Substitution Notice
|
|
|
|
```
|
|
This command contains placeholders for sensitive values.
|
|
|
|
Command:
|
|
vault write auth/approle/login role_id={{ROLE_ID}} secret_id={{SECRET_ID}}
|
|
|
|
Before running:
|
|
1. Replace {{ROLE_ID}} with your AppRole Role ID
|
|
2. Replace {{SECRET_ID}} with your AppRole Secret ID
|
|
|
|
[Copy Command]
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Check Plugin Requirements
|
|
|
|
### 10.1 New Checks for Setup Wizard
|
|
|
|
The following checks may need to be added to existing plugins:
|
|
|
|
| Plugin | New Check ID | Purpose |
|
|
|--------|--------------|---------|
|
|
| Core | `check.auth.admin.exists` | Verify admin user exists |
|
|
| Core | `check.auth.password.policy` | Verify password complexity |
|
|
| Core | `check.crypto.signing.test` | Test signing operation |
|
|
| Database | `check.database.migrations.checksums` | Verify migration integrity |
|
|
| Integration | `check.integration.vault.secrets.access` | Test secret retrieval |
|
|
| Integration | `check.orchestrator.environment.valid` | Validate environment config |
|
|
| Notify | `check.notify.delivery.test` | Test notification delivery |
|
|
|
|
### 10.2 Check Implementation Contract
|
|
|
|
Each check must implement:
|
|
|
|
```csharp
|
|
public interface ISetupWizardAwareCheck : IDoctorCheck
|
|
{
|
|
// Standard check execution
|
|
Task<CheckResult> ExecuteAsync(CheckContext context, CancellationToken ct);
|
|
|
|
// Generate runtime-specific remediations
|
|
ImmutableArray<RemediationCommand> GetRemediations(
|
|
CheckResult result,
|
|
RuntimeEnvironment runtime);
|
|
|
|
// Verification command for this check
|
|
string? GetVerificationCommand(RuntimeEnvironment runtime);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 11. Audit Trail
|
|
|
|
### 11.1 Setup Event Logging
|
|
|
|
All wizard actions are logged to the Timeline service:
|
|
|
|
```csharp
|
|
public sealed record SetupWizardEvent
|
|
{
|
|
public required string EventType { get; init; } // step.started, step.completed, check.failed, etc.
|
|
public required string StepId { get; init; }
|
|
public required string? CheckId { get; init; }
|
|
public required CheckStatus? Status { get; init; }
|
|
public required DateTimeOffset OccurredAt { get; init; }
|
|
public required string? UserId { get; init; }
|
|
public ImmutableDictionary<string, string> Metadata { get; init; }
|
|
}
|
|
```
|
|
|
|
### 11.2 Event Types
|
|
|
|
| Event Type | Description |
|
|
|------------|-------------|
|
|
| `setup.started` | Wizard initiated |
|
|
| `setup.completed` | Wizard finished successfully |
|
|
| `setup.aborted` | Wizard cancelled |
|
|
| `step.started` | Step configuration began |
|
|
| `step.completed` | Step passed all checks |
|
|
| `step.failed` | Step failed validation |
|
|
| `step.skipped` | User skipped optional step |
|
|
| `check.passed` | Individual check passed |
|
|
| `check.failed` | Individual check failed |
|
|
| `check.warned` | Individual check warned |
|
|
| `remediation.copied` | User copied fix command |
|
|
| `remediation.verified` | Fix verification succeeded |
|