Files
git.stella-ops.org/docs/setup/setup-wizard-doctor-contract.md
2026-01-13 18:53:39 +02:00

17 KiB

Setup Wizard - Doctor Integration Contract

This document defines how the Setup Wizard integrates with the Doctor diagnostic system to validate each step and provide actionable remediation guidance.

1. Overview

The Setup Wizard relies on Doctor checks to:

  1. Validate each configuration step
  2. Detect existing configuration (for resume/reconfigure)
  3. Generate runtime-specific fix commands
  4. Verify that fixes were applied correctly

2. Step-to-Check Mapping

2.1 Required Steps

Step ID Doctor Check ID Severity Blocks Progression
database check.database.connectivity Critical Yes
database check.database.permissions Critical Yes
database check.database.version Warning No
valkey check.services.valkey.connectivity Critical Yes
valkey check.services.valkey.ping Critical Yes
migrations check.database.migrations.applied Critical Yes
migrations check.database.migrations.checksums Critical Yes
migrations check.database.schema.version Info No
admin check.auth.admin.exists Critical Yes
admin check.auth.password.policy Warning No
crypto check.crypto.profile.valid Critical Yes
crypto check.crypto.signing.test Warning No

2.2 Optional Steps

Step ID Doctor Check ID Severity Blocks Progression
vault check.integration.vault.connected Warning No
vault check.integration.vault.auth Warning No
vault check.integration.vault.secrets.access Info No
scm check.integration.scm.github.auth Info No
scm check.integration.scm.github.permissions Info No
scm check.integration.scm.gitlab.auth Info No
registry check.integration.registry.connected Info No
notifications check.notify.channel.configured Info No
notifications check.notify.slack.webhook Info No
notifications check.notify.email.smtp Info No
identity check.security.identity.configured Info No
identity check.security.oidc.provider Info No
environments check.orchestrator.environment.exists Info No
environments check.orchestrator.environment.valid Info No
agents check.orchestrator.agent.registered Info No
agents check.orchestrator.agent.healthy Info No
feeds check.feeds.sync.enabled Info No

3. Check Output Model

3.1 CheckResult Schema

public sealed record CheckResult
{
    public required string CheckId { get; init; }
    public required CheckStatus Status { get; init; }  // Pass, Warn, Fail
    public required string Message { get; init; }
    public required TimeSpan Duration { get; init; }
    public ImmutableDictionary<string, object> Evidence { get; init; }
    public ImmutableArray<LikelyCause> LikelyCauses { get; init; }
    public ImmutableArray<RemediationCommand> Remediations { get; init; }
    public string? VerificationCommand { get; init; }
}

public enum CheckStatus { Pass, Warn, Fail }

public sealed record LikelyCause
{
    public required int Priority { get; init; }  // 1 = most likely
    public required string Description { get; init; }
    public string? DocumentationUrl { get; init; }
}

public sealed record RemediationCommand
{
    public required RuntimeEnvironment Runtime { get; init; }
    public required string Command { get; init; }
    public required string Description { get; init; }
    public bool RequiresSudo { get; init; }
    public bool IsDangerous { get; init; }  // Requires confirmation
    public ImmutableDictionary<string, string> Placeholders { get; init; }
}

public enum RuntimeEnvironment
{
    DockerCompose,
    Kubernetes,
    Systemd,
    WindowsService,
    Bare,
    Any
}

3.2 Evidence Dictionary

The Evidence dictionary contains check-specific data:

Check Category Evidence Keys
Database host, port, database, version, user, sslMode
Valkey host, port, version, usedMemory, maxMemory
Migrations pendingCount, appliedCount, lastMigration, failedMigrations
Auth adminCount, adminUsername, passwordLastChanged
Vault provider, version, mountPoints, authMethod
SCM provider, rateLimit, remainingCalls, organization

4. Remediation Command Generation

4.1 Runtime Detection

The wizard detects the runtime environment via:

public interface IRuntimeDetector
{
    RuntimeEnvironment Detect();
    bool IsDockerAvailable();
    bool IsKubernetesContext();
    bool IsSystemdManaged(string serviceName);
    string GetComposeProjectPath();
    string GetKubernetesNamespace();
}

Detection logic:

  1. Check for /.dockerenv file → Docker container
  2. Check for KUBERNETES_SERVICE_HOST → Kubernetes
  3. Check for docker compose command → Docker Compose
  4. Check for systemctl command → systemd
  5. Check for Windows services → Windows Service
  6. Default → Bare (manual)

4.2 Command Templates

Database Connection Failure

check.database.connectivity:
  likelyCauses:
    - priority: 1
      description: "PostgreSQL is not running"
    - priority: 2
      description: "Firewall blocking port 5432"
    - priority: 3
      description: "Incorrect host or port"
    - priority: 4
      description: "Network connectivity issue"

  remediations:
    - runtime: DockerCompose
      description: "Start PostgreSQL container"
      command: "docker compose -f {{COMPOSE_FILE}} up -d postgres"
      placeholders:
        COMPOSE_FILE: "devops/compose/docker-compose.yml"

    - runtime: Kubernetes
      description: "Check PostgreSQL pod status"
      command: "kubectl get pods -n {{NAMESPACE}} -l app=postgres"
      placeholders:
        NAMESPACE: "stellaops"

    - runtime: Systemd
      description: "Start PostgreSQL service"
      command: "sudo systemctl start postgresql"
      requiresSudo: true

    - runtime: Any
      description: "Verify PostgreSQL is listening"
      command: "pg_isready -h {{HOST}} -p {{PORT}}"
      placeholders:
        HOST: "localhost"
        PORT: "5432"

  verificationCommand: "pg_isready -h {{HOST}} -p {{PORT}}"

Valkey Connection Failure

check.services.valkey.connectivity:
  likelyCauses:
    - priority: 1
      description: "Valkey/Redis is not running"
    - priority: 2
      description: "Firewall blocking port 6379"
    - priority: 3
      description: "Authentication required but not configured"

  remediations:
    - runtime: DockerCompose
      description: "Start Valkey container"
      command: "docker compose -f {{COMPOSE_FILE}} up -d valkey"
      placeholders:
        COMPOSE_FILE: "devops/compose/docker-compose.yml"

    - runtime: Kubernetes
      description: "Check Valkey pod status"
      command: "kubectl get pods -n {{NAMESPACE}} -l app=valkey"
      placeholders:
        NAMESPACE: "stellaops"

    - runtime: Systemd
      description: "Start Valkey service"
      command: "sudo systemctl start valkey"
      requiresSudo: true

    - runtime: Any
      description: "Test Valkey connection"
      command: "redis-cli -h {{HOST}} -p {{PORT}} PING"
      placeholders:
        HOST: "localhost"
        PORT: "6379"

  verificationCommand: "redis-cli -h {{HOST}} -p {{PORT}} PING"

Pending Migrations

check.database.migrations.applied:
  likelyCauses:
    - priority: 1
      description: "Pending release migrations require manual execution"
    - priority: 2
      description: "Startup migrations not yet applied"

  remediations:
    - runtime: Any
      description: "Run pending migrations (dry-run first)"
      command: "stella migrations-run --module all --dry-run"

    - runtime: Any
      description: "Apply all pending migrations"
      command: "stella migrations-run --module all"
      isDangerous: true

    - runtime: DockerCompose
      description: "Run migrations in container"
      command: "docker compose exec api stella migrations-run --module all"

    - runtime: Kubernetes
      description: "Run migrations job"
      command: "kubectl apply -f devops/k8s/jobs/migrations.yaml"

  verificationCommand: "stella migrations-run --module all --dry-run"

Vault Authentication Failure

check.integration.vault.auth:
  likelyCauses:
    - priority: 1
      description: "Vault token expired or revoked"
    - priority: 2
      description: "AppRole credentials invalid"
    - priority: 3
      description: "Kubernetes service account not configured"
    - priority: 4
      description: "Vault server unreachable"

  remediations:
    - runtime: Any
      description: "Test Vault connectivity"
      command: "curl -s {{VAULT_ADDR}}/v1/sys/health"
      placeholders:
        VAULT_ADDR: "https://vault.example.com:8200"

    - runtime: Any
      description: "Verify token validity"
      command: "vault token lookup"

    - runtime: Kubernetes
      description: "Check Kubernetes auth configuration"
      command: "kubectl get serviceaccount -n {{NAMESPACE}} stellaops-vault-auth"
      placeholders:
        NAMESPACE: "stellaops"

  verificationCommand: "vault token lookup"

5. Placeholder Resolution

5.1 Placeholder Sources

Placeholders in commands are resolved from:

Source Priority Example
User input 1 (highest) {{HOST}} from form field
Environment 2 {{VAULT_ADDR}} from env
Detection 3 {{NAMESPACE}} from context
Default 4 (lowest) Fallback value

5.2 Placeholder Syntax

{{PLACEHOLDER_NAME}}
{{PLACEHOLDER_NAME:-default_value}}

Examples:

  • {{HOST}} - Required placeholder
  • {{PORT:-5432}} - Optional with default
  • {{COMPOSE_FILE:-docker-compose.yml}} - File path default

5.3 Secret Redaction

Commands containing secrets are never displayed with actual values:

Placeholder Display Actual
{{PASSWORD}} {{PASSWORD}} Never resolved in display
{{TOKEN}} {{TOKEN}} Never resolved in display
{{SECRET_KEY}} {{SECRET_KEY}} Never resolved in display

The user must copy and manually substitute secrets.


6. Verification Flow

6.1 Post-Fix Verification

After the user applies a fix, the wizard:

  1. Wait - Pause for user confirmation ("I've run this command")
  2. Verify - Run the verification command
  3. Re-check - Run the original Doctor check
  4. Report - Show success or next steps

6.2 Verification Command Execution

public interface IVerificationExecutor
{
    Task<VerificationResult> ExecuteAsync(
        string command,
        TimeSpan timeout,
        CancellationToken ct);
}

public sealed record VerificationResult
{
    public required bool Success { get; init; }
    public required int ExitCode { get; init; }
    public required string Output { get; init; }
    public required TimeSpan Duration { get; init; }
}

6.3 Re-Check Behavior

[FAIL] check.database.connectivity

Suggested fix applied. Verifying...

[RUN] pg_isready -h localhost -p 5432
      localhost:5432 - accepting connections

Re-running check...

[PASS] check.database.connectivity
       PostgreSQL connection successful

7. Check Aggregation

7.1 Step Completion Criteria

A step is complete when:

  • All Critical checks pass
  • No Fail status on any check
  • User has acknowledged all Warning checks

7.2 Aggregated Status

public enum StepValidationStatus
{
    NotStarted,      // No checks run
    InProgress,      // Checks running
    Passed,          // All critical pass, no failures
    PassedWithWarns, // All critical pass, some warnings
    Failed,          // Any critical failure
    Skipped          // User explicitly skipped
}

7.3 Status Rollup for Thresholds

Operational Threshold:
  [x] check.database.connectivity         PASS
  [x] check.database.permissions          PASS
  [x] check.database.migrations.applied   PASS
  [x] check.services.valkey.connectivity  PASS
  [x] check.auth.admin.exists             PASS
  [x] check.crypto.profile.valid          PASS

  Status: OPERATIONAL (6/6 required checks passed)

Production-Ready Threshold:
  [x] check.security.identity.configured  PASS
  [x] check.integration.vault.connected   PASS
  [x] check.integration.scm.connected     PASS
  [x] check.notify.channel.configured     PASS
  [ ] check.orchestrator.agent.healthy    SKIP
  [ ] check.feeds.sync.enabled            SKIP

  Status: NOT PRODUCTION-READY (4/6 recommended, 2 skipped)

8. Doctor Engine Integration

8.1 Wizard-Specific Check Context

The wizard provides context to Doctor checks:

public sealed record WizardCheckContext
{
    public required string StepId { get; init; }
    public required RuntimeEnvironment DetectedRuntime { get; init; }
    public required ImmutableDictionary<string, string> UserInputs { get; init; }
    public bool GenerateRemediations { get; init; } = true;
    public bool IncludePlaceholders { get; init; } = true;
}

8.2 Check Invocation

public interface IWizardDoctorClient
{
    Task<ImmutableArray<CheckResult>> RunStepChecksAsync(
        string stepId,
        WizardCheckContext context,
        CancellationToken ct);

    Task<CheckResult> RunSingleCheckAsync(
        string checkId,
        WizardCheckContext context,
        CancellationToken ct);

    Task<VerificationResult> RunVerificationAsync(
        string command,
        WizardCheckContext context,
        CancellationToken ct);
}

8.3 Check Timeout

Check Category Default Timeout Max Timeout
Connectivity 10 seconds 30 seconds
Authentication 15 seconds 60 seconds
Migrations 60 seconds 300 seconds
Full validation 30 seconds 120 seconds

9. Remediation Safety

9.1 Dangerous Commands

Commands marked isDangerous: true require user confirmation:

WARNING: This command will modify your database schema.

Command:
  stella migrations-run --module all

This action:
  - Applies 5 pending migrations
  - Cannot be automatically rolled back
  - May take several minutes

Type 'apply' to confirm: _

9.2 Sudo Requirements

Commands requiring sudo show a notice:

This command requires administrator privileges.

Command:
  sudo systemctl start postgresql

[Copy Command]

Note: You may be prompted for your password.

9.3 Secret Substitution Notice

This command contains placeholders for sensitive values.

Command:
  vault write auth/approle/login role_id={{ROLE_ID}} secret_id={{SECRET_ID}}

Before running:
  1. Replace {{ROLE_ID}} with your AppRole Role ID
  2. Replace {{SECRET_ID}} with your AppRole Secret ID

[Copy Command]

10. Check Plugin Requirements

10.1 New Checks for Setup Wizard

The following checks may need to be added to existing plugins:

Plugin New Check ID Purpose
Core check.auth.admin.exists Verify admin user exists
Core check.auth.password.policy Verify password complexity
Core check.crypto.signing.test Test signing operation
Database check.database.migrations.checksums Verify migration integrity
Integration check.integration.vault.secrets.access Test secret retrieval
Integration check.orchestrator.environment.valid Validate environment config
Notify check.notify.delivery.test Test notification delivery

10.2 Check Implementation Contract

Each check must implement:

public interface ISetupWizardAwareCheck : IDoctorCheck
{
    // Standard check execution
    Task<CheckResult> ExecuteAsync(CheckContext context, CancellationToken ct);

    // Generate runtime-specific remediations
    ImmutableArray<RemediationCommand> GetRemediations(
        CheckResult result,
        RuntimeEnvironment runtime);

    // Verification command for this check
    string? GetVerificationCommand(RuntimeEnvironment runtime);
}

11. Audit Trail

11.1 Setup Event Logging

All wizard actions are logged to the Timeline service:

public sealed record SetupWizardEvent
{
    public required string EventType { get; init; }  // step.started, step.completed, check.failed, etc.
    public required string StepId { get; init; }
    public required string? CheckId { get; init; }
    public required CheckStatus? Status { get; init; }
    public required DateTimeOffset OccurredAt { get; init; }
    public required string? UserId { get; init; }
    public ImmutableDictionary<string, string> Metadata { get; init; }
}

11.2 Event Types

Event Type Description
setup.started Wizard initiated
setup.completed Wizard finished successfully
setup.aborted Wizard cancelled
step.started Step configuration began
step.completed Step passed all checks
step.failed Step failed validation
step.skipped User skipped optional step
check.passed Individual check passed
check.failed Individual check failed
check.warned Individual check warned
remediation.copied User copied fix command
remediation.verified Fix verification succeeded