# Setup Wizard - Doctor Integration Contract This document defines how the Setup Wizard integrates with the Doctor diagnostic system to validate each step and provide actionable remediation guidance. ## 1. Overview The Setup Wizard relies on Doctor checks to: 1. **Validate** each configuration step 2. **Detect** existing configuration (for resume/reconfigure) 3. **Generate** runtime-specific fix commands 4. **Verify** that fixes were applied correctly --- ## 2. Step-to-Check Mapping ### 2.1 Required Steps | Step ID | Doctor Check ID | Severity | Blocks Progression | |---------|-----------------|----------|-------------------| | `database` | `check.database.connectivity` | Critical | Yes | | `database` | `check.database.permissions` | Critical | Yes | | `database` | `check.database.version` | Warning | No | | `valkey` | `check.services.valkey.connectivity` | Critical | Yes | | `valkey` | `check.services.valkey.ping` | Critical | Yes | | `migrations` | `check.database.migrations.applied` | Critical | Yes | | `migrations` | `check.database.migrations.checksums` | Critical | Yes | | `migrations` | `check.database.schema.version` | Info | No | | `admin` | `check.auth.admin.exists` | Critical | Yes | | `admin` | `check.auth.password.policy` | Warning | No | | `crypto` | `check.crypto.profile.valid` | Critical | Yes | | `crypto` | `check.crypto.signing.test` | Warning | No | ### 2.2 Optional Steps | Step ID | Doctor Check ID | Severity | Blocks Progression | |---------|-----------------|----------|-------------------| | `vault` | `check.integration.vault.connected` | Warning | No | | `vault` | `check.integration.vault.auth` | Warning | No | | `vault` | `check.integration.vault.secrets.access` | Info | No | | `scm` | `check.integration.scm.github.auth` | Info | No | | `scm` | `check.integration.scm.github.permissions` | Info | No | | `scm` | `check.integration.scm.gitlab.auth` | Info | No | | `registry` | `check.integration.registry.connected` | Info | No | | `notifications` | `check.notify.channel.configured` | Info | No | | `notifications` | `check.notify.slack.webhook` | Info | No | | `notifications` | `check.notify.email.smtp` | Info | No | | `identity` | `check.security.identity.configured` | Info | No | | `identity` | `check.security.oidc.provider` | Info | No | | `environments` | `check.orchestrator.environment.exists` | Info | No | | `environments` | `check.orchestrator.environment.valid` | Info | No | | `agents` | `check.orchestrator.agent.registered` | Info | No | | `agents` | `check.orchestrator.agent.healthy` | Info | No | | `feeds` | `check.feeds.sync.enabled` | Info | No | --- ## 3. Check Output Model ### 3.1 CheckResult Schema ```csharp public sealed record CheckResult { public required string CheckId { get; init; } public required CheckStatus Status { get; init; } // Pass, Warn, Fail public required string Message { get; init; } public required TimeSpan Duration { get; init; } public ImmutableDictionary Evidence { get; init; } public ImmutableArray LikelyCauses { get; init; } public ImmutableArray Remediations { get; init; } public string? VerificationCommand { get; init; } } public enum CheckStatus { Pass, Warn, Fail } public sealed record LikelyCause { public required int Priority { get; init; } // 1 = most likely public required string Description { get; init; } public string? DocumentationUrl { get; init; } } public sealed record RemediationCommand { public required RuntimeEnvironment Runtime { get; init; } public required string Command { get; init; } public required string Description { get; init; } public bool RequiresSudo { get; init; } public bool IsDangerous { get; init; } // Requires confirmation public ImmutableDictionary Placeholders { get; init; } } public enum RuntimeEnvironment { DockerCompose, Kubernetes, Systemd, WindowsService, Bare, Any } ``` ### 3.2 Evidence Dictionary The `Evidence` dictionary contains check-specific data: | Check Category | Evidence Keys | |----------------|---------------| | **Database** | `host`, `port`, `database`, `version`, `user`, `sslMode` | | **Valkey** | `host`, `port`, `version`, `usedMemory`, `maxMemory` | | **Migrations** | `pendingCount`, `appliedCount`, `lastMigration`, `failedMigrations` | | **Auth** | `adminCount`, `adminUsername`, `passwordLastChanged` | | **Vault** | `provider`, `version`, `mountPoints`, `authMethod` | | **SCM** | `provider`, `rateLimit`, `remainingCalls`, `organization` | --- ## 4. Remediation Command Generation ### 4.1 Runtime Detection The wizard detects the runtime environment via: ```csharp public interface IRuntimeDetector { RuntimeEnvironment Detect(); bool IsDockerAvailable(); bool IsKubernetesContext(); bool IsSystemdManaged(string serviceName); string GetComposeProjectPath(); string GetKubernetesNamespace(); } ``` Detection logic: 1. Check for `/.dockerenv` file → Docker container 2. Check for `KUBERNETES_SERVICE_HOST` → Kubernetes 3. Check for `docker compose` command → Docker Compose 4. Check for `systemctl` command → systemd 5. Check for Windows services → Windows Service 6. Default → Bare (manual) ### 4.2 Command Templates #### Database Connection Failure ```yaml check.database.connectivity: likelyCauses: - priority: 1 description: "PostgreSQL is not running" - priority: 2 description: "Firewall blocking port 5432" - priority: 3 description: "Incorrect host or port" - priority: 4 description: "Network connectivity issue" remediations: - runtime: DockerCompose description: "Start PostgreSQL container" command: "docker compose -f {{COMPOSE_FILE}} up -d postgres" placeholders: COMPOSE_FILE: "devops/compose/docker-compose.yml" - runtime: Kubernetes description: "Check PostgreSQL pod status" command: "kubectl get pods -n {{NAMESPACE}} -l app=postgres" placeholders: NAMESPACE: "stellaops" - runtime: Systemd description: "Start PostgreSQL service" command: "sudo systemctl start postgresql" requiresSudo: true - runtime: Any description: "Verify PostgreSQL is listening" command: "pg_isready -h {{HOST}} -p {{PORT}}" placeholders: HOST: "localhost" PORT: "5432" verificationCommand: "pg_isready -h {{HOST}} -p {{PORT}}" ``` #### Valkey Connection Failure ```yaml check.services.valkey.connectivity: likelyCauses: - priority: 1 description: "Valkey/Redis is not running" - priority: 2 description: "Firewall blocking port 6379" - priority: 3 description: "Authentication required but not configured" remediations: - runtime: DockerCompose description: "Start Valkey container" command: "docker compose -f {{COMPOSE_FILE}} up -d valkey" placeholders: COMPOSE_FILE: "devops/compose/docker-compose.yml" - runtime: Kubernetes description: "Check Valkey pod status" command: "kubectl get pods -n {{NAMESPACE}} -l app=valkey" placeholders: NAMESPACE: "stellaops" - runtime: Systemd description: "Start Valkey service" command: "sudo systemctl start valkey" requiresSudo: true - runtime: Any description: "Test Valkey connection" command: "valkey-cli -h {{HOST}} -p {{PORT}} PING" placeholders: HOST: "localhost" PORT: "6379" verificationCommand: "valkey-cli -h {{HOST}} -p {{PORT}} PING" ``` #### Pending Migrations ```yaml check.database.migrations.applied: likelyCauses: - priority: 1 description: "Pending release migrations require manual execution" - priority: 2 description: "Startup migrations not yet applied" remediations: - runtime: Any description: "Run pending migrations (dry-run first)" command: "stella migrations-run --module all --dry-run" - runtime: Any description: "Apply all pending migrations" command: "stella migrations-run --module all" isDangerous: true - runtime: DockerCompose description: "Run migrations in container" command: "docker compose exec api stella migrations-run --module all" - runtime: Kubernetes description: "Run migrations job" command: "kubectl apply -f devops/k8s/jobs/migrations.yaml" verificationCommand: "stella migrations-run --module all --dry-run" ``` #### Vault Authentication Failure ```yaml check.integration.vault.auth: likelyCauses: - priority: 1 description: "Vault token expired or revoked" - priority: 2 description: "AppRole credentials invalid" - priority: 3 description: "Kubernetes service account not configured" - priority: 4 description: "Vault server unreachable" remediations: - runtime: Any description: "Test Vault connectivity" command: "curl -s {{VAULT_ADDR}}/v1/sys/health" placeholders: VAULT_ADDR: "https://vault.example.com:8200" - runtime: Any description: "Verify token validity" command: "vault token lookup" - runtime: Kubernetes description: "Check Kubernetes auth configuration" command: "kubectl get serviceaccount -n {{NAMESPACE}} stellaops-vault-auth" placeholders: NAMESPACE: "stellaops" verificationCommand: "vault token lookup" ``` --- ## 5. Placeholder Resolution ### 5.1 Placeholder Sources Placeholders in commands are resolved from: | Source | Priority | Example | |--------|----------|---------| | User input | 1 (highest) | `{{HOST}}` from form field | | Environment | 2 | `{{VAULT_ADDR}}` from env | | Detection | 3 | `{{NAMESPACE}}` from context | | Default | 4 (lowest) | Fallback value | ### 5.2 Placeholder Syntax ``` {{PLACEHOLDER_NAME}} {{PLACEHOLDER_NAME:-default_value}} ``` Examples: - `{{HOST}}` - Required placeholder - `{{PORT:-5432}}` - Optional with default - `{{COMPOSE_FILE:-docker-compose.yml}}` - File path default ### 5.3 Secret Redaction Commands containing secrets are never displayed with actual values: | Placeholder | Display | Actual | |-------------|---------|--------| | `{{PASSWORD}}` | `{{PASSWORD}}` | Never resolved in display | | `{{TOKEN}}` | `{{TOKEN}}` | Never resolved in display | | `{{SECRET_KEY}}` | `{{SECRET_KEY}}` | Never resolved in display | The user must copy and manually substitute secrets. --- ## 6. Verification Flow ### 6.1 Post-Fix Verification After the user applies a fix, the wizard: 1. **Wait** - Pause for user confirmation ("I've run this command") 2. **Verify** - Run the verification command 3. **Re-check** - Run the original Doctor check 4. **Report** - Show success or next steps ### 6.2 Verification Command Execution ```csharp public interface IVerificationExecutor { Task ExecuteAsync( string command, TimeSpan timeout, CancellationToken ct); } public sealed record VerificationResult { public required bool Success { get; init; } public required int ExitCode { get; init; } public required string Output { get; init; } public required TimeSpan Duration { get; init; } } ``` ### 6.3 Re-Check Behavior ``` [FAIL] check.database.connectivity Suggested fix applied. Verifying... [RUN] pg_isready -h localhost -p 5432 localhost:5432 - accepting connections Re-running check... [PASS] check.database.connectivity PostgreSQL connection successful ``` --- ## 7. Check Aggregation ### 7.1 Step Completion Criteria A step is complete when: - All **Critical** checks pass - No **Fail** status on any check - User has acknowledged all **Warning** checks ### 7.2 Aggregated Status ```csharp public enum StepValidationStatus { NotStarted, // No checks run InProgress, // Checks running Passed, // All critical pass, no failures PassedWithWarns, // All critical pass, some warnings Failed, // Any critical failure Skipped // User explicitly skipped } ``` ### 7.3 Status Rollup for Thresholds ``` Operational Threshold: [x] check.database.connectivity PASS [x] check.database.permissions PASS [x] check.database.migrations.applied PASS [x] check.services.valkey.connectivity PASS [x] check.auth.admin.exists PASS [x] check.crypto.profile.valid PASS Status: OPERATIONAL (6/6 required checks passed) Production-Ready Threshold: [x] check.security.identity.configured PASS [x] check.integration.vault.connected PASS [x] check.integration.scm.connected PASS [x] check.notify.channel.configured PASS [ ] check.orchestrator.agent.healthy SKIP [ ] check.feeds.sync.enabled SKIP Status: NOT PRODUCTION-READY (4/6 recommended, 2 skipped) ``` --- ## 8. Doctor Engine Integration ### 8.1 Wizard-Specific Check Context The wizard provides context to Doctor checks: ```csharp public sealed record WizardCheckContext { public required string StepId { get; init; } public required RuntimeEnvironment DetectedRuntime { get; init; } public required ImmutableDictionary UserInputs { get; init; } public bool GenerateRemediations { get; init; } = true; public bool IncludePlaceholders { get; init; } = true; } ``` ### 8.2 Check Invocation ```csharp public interface IWizardDoctorClient { Task> RunStepChecksAsync( string stepId, WizardCheckContext context, CancellationToken ct); Task RunSingleCheckAsync( string checkId, WizardCheckContext context, CancellationToken ct); Task RunVerificationAsync( string command, WizardCheckContext context, CancellationToken ct); } ``` ### 8.3 Check Timeout | Check Category | Default Timeout | Max Timeout | |----------------|-----------------|-------------| | Connectivity | 10 seconds | 30 seconds | | Authentication | 15 seconds | 60 seconds | | Migrations | 60 seconds | 300 seconds | | Full validation | 30 seconds | 120 seconds | --- ## 9. Remediation Safety ### 9.1 Dangerous Commands Commands marked `isDangerous: true` require user confirmation: ``` WARNING: This command will modify your database schema. Command: stella migrations-run --module all This action: - Applies 5 pending migrations - Cannot be automatically rolled back - May take several minutes Type 'apply' to confirm: _ ``` ### 9.2 Sudo Requirements Commands requiring `sudo` show a notice: ``` This command requires administrator privileges. Command: sudo systemctl start postgresql [Copy Command] Note: You may be prompted for your password. ``` ### 9.3 Secret Substitution Notice ``` This command contains placeholders for sensitive values. Command: vault write auth/approle/login role_id={{ROLE_ID}} secret_id={{SECRET_ID}} Before running: 1. Replace {{ROLE_ID}} with your AppRole Role ID 2. Replace {{SECRET_ID}} with your AppRole Secret ID [Copy Command] ``` --- ## 10. Check Plugin Requirements ### 10.1 New Checks for Setup Wizard The following checks may need to be added to existing plugins: | Plugin | New Check ID | Purpose | |--------|--------------|---------| | Core | `check.auth.admin.exists` | Verify admin user exists | | Core | `check.auth.password.policy` | Verify password complexity | | Core | `check.crypto.signing.test` | Test signing operation | | Database | `check.database.migrations.checksums` | Verify migration integrity | | Integration | `check.integration.vault.secrets.access` | Test secret retrieval | | Integration | `check.orchestrator.environment.valid` | Validate environment config | | Notify | `check.notify.delivery.test` | Test notification delivery | ### 10.2 Check Implementation Contract Each check must implement: ```csharp public interface ISetupWizardAwareCheck : IDoctorCheck { // Standard check execution Task ExecuteAsync(CheckContext context, CancellationToken ct); // Generate runtime-specific remediations ImmutableArray GetRemediations( CheckResult result, RuntimeEnvironment runtime); // Verification command for this check string? GetVerificationCommand(RuntimeEnvironment runtime); } ``` --- ## 11. Audit Trail ### 11.1 Setup Event Logging All wizard actions are logged to the Timeline service: ```csharp public sealed record SetupWizardEvent { public required string EventType { get; init; } // step.started, step.completed, check.failed, etc. public required string StepId { get; init; } public required string? CheckId { get; init; } public required CheckStatus? Status { get; init; } public required DateTimeOffset OccurredAt { get; init; } public required string? UserId { get; init; } public ImmutableDictionary Metadata { get; init; } } ``` ### 11.2 Event Types | Event Type | Description | |------------|-------------| | `setup.started` | Wizard initiated | | `setup.completed` | Wizard finished successfully | | `setup.aborted` | Wizard cancelled | | `step.started` | Step configuration began | | `step.completed` | Step passed all checks | | `step.failed` | Step failed validation | | `step.skipped` | User skipped optional step | | `check.passed` | Individual check passed | | `check.failed` | Individual check failed | | `check.warned` | Individual check warned | | `remediation.copied` | User copied fix command | | `remediation.verified` | Fix verification succeeded |