17 KiB
Setup Wizard - Doctor Integration Contract
This document defines how the Setup Wizard integrates with the Doctor diagnostic system to validate each step and provide actionable remediation guidance.
1. Overview
The Setup Wizard relies on Doctor checks to:
- Validate each configuration step
- Detect existing configuration (for resume/reconfigure)
- Generate runtime-specific fix commands
- Verify that fixes were applied correctly
2. Step-to-Check Mapping
2.1 Required Steps
| Step ID | Doctor Check ID | Severity | Blocks Progression |
|---|---|---|---|
database |
check.database.connectivity |
Critical | Yes |
database |
check.database.permissions |
Critical | Yes |
database |
check.database.version |
Warning | No |
valkey |
check.services.valkey.connectivity |
Critical | Yes |
valkey |
check.services.valkey.ping |
Critical | Yes |
migrations |
check.database.migrations.applied |
Critical | Yes |
migrations |
check.database.migrations.checksums |
Critical | Yes |
migrations |
check.database.schema.version |
Info | No |
admin |
check.auth.admin.exists |
Critical | Yes |
admin |
check.auth.password.policy |
Warning | No |
crypto |
check.crypto.profile.valid |
Critical | Yes |
crypto |
check.crypto.signing.test |
Warning | No |
2.2 Optional Steps
| Step ID | Doctor Check ID | Severity | Blocks Progression |
|---|---|---|---|
vault |
check.integration.vault.connected |
Warning | No |
vault |
check.integration.vault.auth |
Warning | No |
vault |
check.integration.vault.secrets.access |
Info | No |
scm |
check.integration.scm.github.auth |
Info | No |
scm |
check.integration.scm.github.permissions |
Info | No |
scm |
check.integration.scm.gitlab.auth |
Info | No |
registry |
check.integration.registry.connected |
Info | No |
notifications |
check.notify.channel.configured |
Info | No |
notifications |
check.notify.slack.webhook |
Info | No |
notifications |
check.notify.email.smtp |
Info | No |
identity |
check.security.identity.configured |
Info | No |
identity |
check.security.oidc.provider |
Info | No |
environments |
check.orchestrator.environment.exists |
Info | No |
environments |
check.orchestrator.environment.valid |
Info | No |
agents |
check.orchestrator.agent.registered |
Info | No |
agents |
check.orchestrator.agent.healthy |
Info | No |
feeds |
check.feeds.sync.enabled |
Info | No |
3. Check Output Model
3.1 CheckResult Schema
public sealed record CheckResult
{
public required string CheckId { get; init; }
public required CheckStatus Status { get; init; } // Pass, Warn, Fail
public required string Message { get; init; }
public required TimeSpan Duration { get; init; }
public ImmutableDictionary<string, object> Evidence { get; init; }
public ImmutableArray<LikelyCause> LikelyCauses { get; init; }
public ImmutableArray<RemediationCommand> Remediations { get; init; }
public string? VerificationCommand { get; init; }
}
public enum CheckStatus { Pass, Warn, Fail }
public sealed record LikelyCause
{
public required int Priority { get; init; } // 1 = most likely
public required string Description { get; init; }
public string? DocumentationUrl { get; init; }
}
public sealed record RemediationCommand
{
public required RuntimeEnvironment Runtime { get; init; }
public required string Command { get; init; }
public required string Description { get; init; }
public bool RequiresSudo { get; init; }
public bool IsDangerous { get; init; } // Requires confirmation
public ImmutableDictionary<string, string> Placeholders { get; init; }
}
public enum RuntimeEnvironment
{
DockerCompose,
Kubernetes,
Systemd,
WindowsService,
Bare,
Any
}
3.2 Evidence Dictionary
The Evidence dictionary contains check-specific data:
| Check Category | Evidence Keys |
|---|---|
| Database | host, port, database, version, user, sslMode |
| Valkey | host, port, version, usedMemory, maxMemory |
| Migrations | pendingCount, appliedCount, lastMigration, failedMigrations |
| Auth | adminCount, adminUsername, passwordLastChanged |
| Vault | provider, version, mountPoints, authMethod |
| SCM | provider, rateLimit, remainingCalls, organization |
4. Remediation Command Generation
4.1 Runtime Detection
The wizard detects the runtime environment via:
public interface IRuntimeDetector
{
RuntimeEnvironment Detect();
bool IsDockerAvailable();
bool IsKubernetesContext();
bool IsSystemdManaged(string serviceName);
string GetComposeProjectPath();
string GetKubernetesNamespace();
}
Detection logic:
- Check for
/.dockerenvfile → Docker container - Check for
KUBERNETES_SERVICE_HOST→ Kubernetes - Check for
docker composecommand → Docker Compose - Check for
systemctlcommand → systemd - Check for Windows services → Windows Service
- Default → Bare (manual)
4.2 Command Templates
Database Connection Failure
check.database.connectivity:
likelyCauses:
- priority: 1
description: "PostgreSQL is not running"
- priority: 2
description: "Firewall blocking port 5432"
- priority: 3
description: "Incorrect host or port"
- priority: 4
description: "Network connectivity issue"
remediations:
- runtime: DockerCompose
description: "Start PostgreSQL container"
command: "docker compose -f {{COMPOSE_FILE}} up -d postgres"
placeholders:
COMPOSE_FILE: "devops/compose/docker-compose.yml"
- runtime: Kubernetes
description: "Check PostgreSQL pod status"
command: "kubectl get pods -n {{NAMESPACE}} -l app=postgres"
placeholders:
NAMESPACE: "stellaops"
- runtime: Systemd
description: "Start PostgreSQL service"
command: "sudo systemctl start postgresql"
requiresSudo: true
- runtime: Any
description: "Verify PostgreSQL is listening"
command: "pg_isready -h {{HOST}} -p {{PORT}}"
placeholders:
HOST: "localhost"
PORT: "5432"
verificationCommand: "pg_isready -h {{HOST}} -p {{PORT}}"
Valkey Connection Failure
check.services.valkey.connectivity:
likelyCauses:
- priority: 1
description: "Valkey/Redis is not running"
- priority: 2
description: "Firewall blocking port 6379"
- priority: 3
description: "Authentication required but not configured"
remediations:
- runtime: DockerCompose
description: "Start Valkey container"
command: "docker compose -f {{COMPOSE_FILE}} up -d valkey"
placeholders:
COMPOSE_FILE: "devops/compose/docker-compose.yml"
- runtime: Kubernetes
description: "Check Valkey pod status"
command: "kubectl get pods -n {{NAMESPACE}} -l app=valkey"
placeholders:
NAMESPACE: "stellaops"
- runtime: Systemd
description: "Start Valkey service"
command: "sudo systemctl start valkey"
requiresSudo: true
- runtime: Any
description: "Test Valkey connection"
command: "valkey-cli -h {{HOST}} -p {{PORT}} PING"
placeholders:
HOST: "localhost"
PORT: "6379"
verificationCommand: "valkey-cli -h {{HOST}} -p {{PORT}} PING"
Pending Migrations
check.database.migrations.applied:
likelyCauses:
- priority: 1
description: "Pending release migrations require manual execution"
- priority: 2
description: "Startup migrations not yet applied"
remediations:
- runtime: Any
description: "Run pending migrations (dry-run first)"
command: "stella migrations-run --module all --dry-run"
- runtime: Any
description: "Apply all pending migrations"
command: "stella migrations-run --module all"
isDangerous: true
- runtime: DockerCompose
description: "Run migrations in container"
command: "docker compose exec api stella migrations-run --module all"
- runtime: Kubernetes
description: "Run migrations job"
command: "kubectl apply -f devops/k8s/jobs/migrations.yaml"
verificationCommand: "stella migrations-run --module all --dry-run"
Vault Authentication Failure
check.integration.vault.auth:
likelyCauses:
- priority: 1
description: "Vault token expired or revoked"
- priority: 2
description: "AppRole credentials invalid"
- priority: 3
description: "Kubernetes service account not configured"
- priority: 4
description: "Vault server unreachable"
remediations:
- runtime: Any
description: "Test Vault connectivity"
command: "curl -s {{VAULT_ADDR}}/v1/sys/health"
placeholders:
VAULT_ADDR: "https://vault.example.com:8200"
- runtime: Any
description: "Verify token validity"
command: "vault token lookup"
- runtime: Kubernetes
description: "Check Kubernetes auth configuration"
command: "kubectl get serviceaccount -n {{NAMESPACE}} stellaops-vault-auth"
placeholders:
NAMESPACE: "stellaops"
verificationCommand: "vault token lookup"
5. Placeholder Resolution
5.1 Placeholder Sources
Placeholders in commands are resolved from:
| Source | Priority | Example |
|---|---|---|
| User input | 1 (highest) | {{HOST}} from form field |
| Environment | 2 | {{VAULT_ADDR}} from env |
| Detection | 3 | {{NAMESPACE}} from context |
| Default | 4 (lowest) | Fallback value |
5.2 Placeholder Syntax
{{PLACEHOLDER_NAME}}
{{PLACEHOLDER_NAME:-default_value}}
Examples:
{{HOST}}- Required placeholder{{PORT:-5432}}- Optional with default{{COMPOSE_FILE:-docker-compose.yml}}- File path default
5.3 Secret Redaction
Commands containing secrets are never displayed with actual values:
| Placeholder | Display | Actual |
|---|---|---|
{{PASSWORD}} |
{{PASSWORD}} |
Never resolved in display |
{{TOKEN}} |
{{TOKEN}} |
Never resolved in display |
{{SECRET_KEY}} |
{{SECRET_KEY}} |
Never resolved in display |
The user must copy and manually substitute secrets.
6. Verification Flow
6.1 Post-Fix Verification
After the user applies a fix, the wizard:
- Wait - Pause for user confirmation ("I've run this command")
- Verify - Run the verification command
- Re-check - Run the original Doctor check
- Report - Show success or next steps
6.2 Verification Command Execution
public interface IVerificationExecutor
{
Task<VerificationResult> ExecuteAsync(
string command,
TimeSpan timeout,
CancellationToken ct);
}
public sealed record VerificationResult
{
public required bool Success { get; init; }
public required int ExitCode { get; init; }
public required string Output { get; init; }
public required TimeSpan Duration { get; init; }
}
6.3 Re-Check Behavior
[FAIL] check.database.connectivity
Suggested fix applied. Verifying...
[RUN] pg_isready -h localhost -p 5432
localhost:5432 - accepting connections
Re-running check...
[PASS] check.database.connectivity
PostgreSQL connection successful
7. Check Aggregation
7.1 Step Completion Criteria
A step is complete when:
- All Critical checks pass
- No Fail status on any check
- User has acknowledged all Warning checks
7.2 Aggregated Status
public enum StepValidationStatus
{
NotStarted, // No checks run
InProgress, // Checks running
Passed, // All critical pass, no failures
PassedWithWarns, // All critical pass, some warnings
Failed, // Any critical failure
Skipped // User explicitly skipped
}
7.3 Status Rollup for Thresholds
Operational Threshold:
[x] check.database.connectivity PASS
[x] check.database.permissions PASS
[x] check.database.migrations.applied PASS
[x] check.services.valkey.connectivity PASS
[x] check.auth.admin.exists PASS
[x] check.crypto.profile.valid PASS
Status: OPERATIONAL (6/6 required checks passed)
Production-Ready Threshold:
[x] check.security.identity.configured PASS
[x] check.integration.vault.connected PASS
[x] check.integration.scm.connected PASS
[x] check.notify.channel.configured PASS
[ ] check.orchestrator.agent.healthy SKIP
[ ] check.feeds.sync.enabled SKIP
Status: NOT PRODUCTION-READY (4/6 recommended, 2 skipped)
8. Doctor Engine Integration
8.1 Wizard-Specific Check Context
The wizard provides context to Doctor checks:
public sealed record WizardCheckContext
{
public required string StepId { get; init; }
public required RuntimeEnvironment DetectedRuntime { get; init; }
public required ImmutableDictionary<string, string> UserInputs { get; init; }
public bool GenerateRemediations { get; init; } = true;
public bool IncludePlaceholders { get; init; } = true;
}
8.2 Check Invocation
public interface IWizardDoctorClient
{
Task<ImmutableArray<CheckResult>> RunStepChecksAsync(
string stepId,
WizardCheckContext context,
CancellationToken ct);
Task<CheckResult> RunSingleCheckAsync(
string checkId,
WizardCheckContext context,
CancellationToken ct);
Task<VerificationResult> RunVerificationAsync(
string command,
WizardCheckContext context,
CancellationToken ct);
}
8.3 Check Timeout
| Check Category | Default Timeout | Max Timeout |
|---|---|---|
| Connectivity | 10 seconds | 30 seconds |
| Authentication | 15 seconds | 60 seconds |
| Migrations | 60 seconds | 300 seconds |
| Full validation | 30 seconds | 120 seconds |
9. Remediation Safety
9.1 Dangerous Commands
Commands marked isDangerous: true require user confirmation:
WARNING: This command will modify your database schema.
Command:
stella migrations-run --module all
This action:
- Applies 5 pending migrations
- Cannot be automatically rolled back
- May take several minutes
Type 'apply' to confirm: _
9.2 Sudo Requirements
Commands requiring sudo show a notice:
This command requires administrator privileges.
Command:
sudo systemctl start postgresql
[Copy Command]
Note: You may be prompted for your password.
9.3 Secret Substitution Notice
This command contains placeholders for sensitive values.
Command:
vault write auth/approle/login role_id={{ROLE_ID}} secret_id={{SECRET_ID}}
Before running:
1. Replace {{ROLE_ID}} with your AppRole Role ID
2. Replace {{SECRET_ID}} with your AppRole Secret ID
[Copy Command]
10. Check Plugin Requirements
10.1 New Checks for Setup Wizard
The following checks may need to be added to existing plugins:
| Plugin | New Check ID | Purpose |
|---|---|---|
| Core | check.auth.admin.exists |
Verify admin user exists |
| Core | check.auth.password.policy |
Verify password complexity |
| Core | check.crypto.signing.test |
Test signing operation |
| Database | check.database.migrations.checksums |
Verify migration integrity |
| Integration | check.integration.vault.secrets.access |
Test secret retrieval |
| Integration | check.orchestrator.environment.valid |
Validate environment config |
| Notify | check.notify.delivery.test |
Test notification delivery |
10.2 Check Implementation Contract
Each check must implement:
public interface ISetupWizardAwareCheck : IDoctorCheck
{
// Standard check execution
Task<CheckResult> ExecuteAsync(CheckContext context, CancellationToken ct);
// Generate runtime-specific remediations
ImmutableArray<RemediationCommand> GetRemediations(
CheckResult result,
RuntimeEnvironment runtime);
// Verification command for this check
string? GetVerificationCommand(RuntimeEnvironment runtime);
}
11. Audit Trail
11.1 Setup Event Logging
All wizard actions are logged to the Timeline service:
public sealed record SetupWizardEvent
{
public required string EventType { get; init; } // step.started, step.completed, check.failed, etc.
public required string StepId { get; init; }
public required string? CheckId { get; init; }
public required CheckStatus? Status { get; init; }
public required DateTimeOffset OccurredAt { get; init; }
public required string? UserId { get; init; }
public ImmutableDictionary<string, string> Metadata { get; init; }
}
11.2 Event Types
| Event Type | Description |
|---|---|
setup.started |
Wizard initiated |
setup.completed |
Wizard finished successfully |
setup.aborted |
Wizard cancelled |
step.started |
Step configuration began |
step.completed |
Step passed all checks |
step.failed |
Step failed validation |
step.skipped |
User skipped optional step |
check.passed |
Individual check passed |
check.failed |
Individual check failed |
check.warned |
Individual check warned |
remediation.copied |
User copied fix command |
remediation.verified |
Fix verification succeeded |