audit, advisories and doctors/setup work

This commit is contained in:
master
2026-01-13 18:53:39 +02:00
parent 9ca7cb183e
commit d7be6ba34b
811 changed files with 54242 additions and 4056 deletions

View File

@@ -0,0 +1,608 @@
# Setup Wizard - Doctor Integration Contract
This document defines how the Setup Wizard integrates with the Doctor diagnostic system to validate each step and provide actionable remediation guidance.
## 1. Overview
The Setup Wizard relies on Doctor checks to:
1. **Validate** each configuration step
2. **Detect** existing configuration (for resume/reconfigure)
3. **Generate** runtime-specific fix commands
4. **Verify** that fixes were applied correctly
---
## 2. Step-to-Check Mapping
### 2.1 Required Steps
| Step ID | Doctor Check ID | Severity | Blocks Progression |
|---------|-----------------|----------|-------------------|
| `database` | `check.database.connectivity` | Critical | Yes |
| `database` | `check.database.permissions` | Critical | Yes |
| `database` | `check.database.version` | Warning | No |
| `valkey` | `check.services.valkey.connectivity` | Critical | Yes |
| `valkey` | `check.services.valkey.ping` | Critical | Yes |
| `migrations` | `check.database.migrations.applied` | Critical | Yes |
| `migrations` | `check.database.migrations.checksums` | Critical | Yes |
| `migrations` | `check.database.schema.version` | Info | No |
| `admin` | `check.auth.admin.exists` | Critical | Yes |
| `admin` | `check.auth.password.policy` | Warning | No |
| `crypto` | `check.crypto.profile.valid` | Critical | Yes |
| `crypto` | `check.crypto.signing.test` | Warning | No |
### 2.2 Optional Steps
| Step ID | Doctor Check ID | Severity | Blocks Progression |
|---------|-----------------|----------|-------------------|
| `vault` | `check.integration.vault.connected` | Warning | No |
| `vault` | `check.integration.vault.auth` | Warning | No |
| `vault` | `check.integration.vault.secrets.access` | Info | No |
| `scm` | `check.integration.scm.github.auth` | Info | No |
| `scm` | `check.integration.scm.github.permissions` | Info | No |
| `scm` | `check.integration.scm.gitlab.auth` | Info | No |
| `registry` | `check.integration.registry.connected` | Info | No |
| `notifications` | `check.notify.channel.configured` | Info | No |
| `notifications` | `check.notify.slack.webhook` | Info | No |
| `notifications` | `check.notify.email.smtp` | Info | No |
| `identity` | `check.security.identity.configured` | Info | No |
| `identity` | `check.security.oidc.provider` | Info | No |
| `environments` | `check.orchestrator.environment.exists` | Info | No |
| `environments` | `check.orchestrator.environment.valid` | Info | No |
| `agents` | `check.orchestrator.agent.registered` | Info | No |
| `agents` | `check.orchestrator.agent.healthy` | Info | No |
| `feeds` | `check.feeds.sync.enabled` | Info | No |
---
## 3. Check Output Model
### 3.1 CheckResult Schema
```csharp
public sealed record CheckResult
{
public required string CheckId { get; init; }
public required CheckStatus Status { get; init; } // Pass, Warn, Fail
public required string Message { get; init; }
public required TimeSpan Duration { get; init; }
public ImmutableDictionary<string, object> Evidence { get; init; }
public ImmutableArray<LikelyCause> LikelyCauses { get; init; }
public ImmutableArray<RemediationCommand> Remediations { get; init; }
public string? VerificationCommand { get; init; }
}
public enum CheckStatus { Pass, Warn, Fail }
public sealed record LikelyCause
{
public required int Priority { get; init; } // 1 = most likely
public required string Description { get; init; }
public string? DocumentationUrl { get; init; }
}
public sealed record RemediationCommand
{
public required RuntimeEnvironment Runtime { get; init; }
public required string Command { get; init; }
public required string Description { get; init; }
public bool RequiresSudo { get; init; }
public bool IsDangerous { get; init; } // Requires confirmation
public ImmutableDictionary<string, string> Placeholders { get; init; }
}
public enum RuntimeEnvironment
{
DockerCompose,
Kubernetes,
Systemd,
WindowsService,
Bare,
Any
}
```
### 3.2 Evidence Dictionary
The `Evidence` dictionary contains check-specific data:
| Check Category | Evidence Keys |
|----------------|---------------|
| **Database** | `host`, `port`, `database`, `version`, `user`, `sslMode` |
| **Valkey** | `host`, `port`, `version`, `usedMemory`, `maxMemory` |
| **Migrations** | `pendingCount`, `appliedCount`, `lastMigration`, `failedMigrations` |
| **Auth** | `adminCount`, `adminUsername`, `passwordLastChanged` |
| **Vault** | `provider`, `version`, `mountPoints`, `authMethod` |
| **SCM** | `provider`, `rateLimit`, `remainingCalls`, `organization` |
---
## 4. Remediation Command Generation
### 4.1 Runtime Detection
The wizard detects the runtime environment via:
```csharp
public interface IRuntimeDetector
{
RuntimeEnvironment Detect();
bool IsDockerAvailable();
bool IsKubernetesContext();
bool IsSystemdManaged(string serviceName);
string GetComposeProjectPath();
string GetKubernetesNamespace();
}
```
Detection logic:
1. Check for `/.dockerenv` file → Docker container
2. Check for `KUBERNETES_SERVICE_HOST` → Kubernetes
3. Check for `docker compose` command → Docker Compose
4. Check for `systemctl` command → systemd
5. Check for Windows services → Windows Service
6. Default → Bare (manual)
### 4.2 Command Templates
#### Database Connection Failure
```yaml
check.database.connectivity:
likelyCauses:
- priority: 1
description: "PostgreSQL is not running"
- priority: 2
description: "Firewall blocking port 5432"
- priority: 3
description: "Incorrect host or port"
- priority: 4
description: "Network connectivity issue"
remediations:
- runtime: DockerCompose
description: "Start PostgreSQL container"
command: "docker compose -f {{COMPOSE_FILE}} up -d postgres"
placeholders:
COMPOSE_FILE: "devops/compose/docker-compose.yml"
- runtime: Kubernetes
description: "Check PostgreSQL pod status"
command: "kubectl get pods -n {{NAMESPACE}} -l app=postgres"
placeholders:
NAMESPACE: "stellaops"
- runtime: Systemd
description: "Start PostgreSQL service"
command: "sudo systemctl start postgresql"
requiresSudo: true
- runtime: Any
description: "Verify PostgreSQL is listening"
command: "pg_isready -h {{HOST}} -p {{PORT}}"
placeholders:
HOST: "localhost"
PORT: "5432"
verificationCommand: "pg_isready -h {{HOST}} -p {{PORT}}"
```
#### Valkey Connection Failure
```yaml
check.services.valkey.connectivity:
likelyCauses:
- priority: 1
description: "Valkey/Redis is not running"
- priority: 2
description: "Firewall blocking port 6379"
- priority: 3
description: "Authentication required but not configured"
remediations:
- runtime: DockerCompose
description: "Start Valkey container"
command: "docker compose -f {{COMPOSE_FILE}} up -d valkey"
placeholders:
COMPOSE_FILE: "devops/compose/docker-compose.yml"
- runtime: Kubernetes
description: "Check Valkey pod status"
command: "kubectl get pods -n {{NAMESPACE}} -l app=valkey"
placeholders:
NAMESPACE: "stellaops"
- runtime: Systemd
description: "Start Valkey service"
command: "sudo systemctl start valkey"
requiresSudo: true
- runtime: Any
description: "Test Valkey connection"
command: "redis-cli -h {{HOST}} -p {{PORT}} PING"
placeholders:
HOST: "localhost"
PORT: "6379"
verificationCommand: "redis-cli -h {{HOST}} -p {{PORT}} PING"
```
#### Pending Migrations
```yaml
check.database.migrations.applied:
likelyCauses:
- priority: 1
description: "Pending release migrations require manual execution"
- priority: 2
description: "Startup migrations not yet applied"
remediations:
- runtime: Any
description: "Run pending migrations (dry-run first)"
command: "stella migrations-run --module all --dry-run"
- runtime: Any
description: "Apply all pending migrations"
command: "stella migrations-run --module all"
isDangerous: true
- runtime: DockerCompose
description: "Run migrations in container"
command: "docker compose exec api stella migrations-run --module all"
- runtime: Kubernetes
description: "Run migrations job"
command: "kubectl apply -f devops/k8s/jobs/migrations.yaml"
verificationCommand: "stella migrations-run --module all --dry-run"
```
#### Vault Authentication Failure
```yaml
check.integration.vault.auth:
likelyCauses:
- priority: 1
description: "Vault token expired or revoked"
- priority: 2
description: "AppRole credentials invalid"
- priority: 3
description: "Kubernetes service account not configured"
- priority: 4
description: "Vault server unreachable"
remediations:
- runtime: Any
description: "Test Vault connectivity"
command: "curl -s {{VAULT_ADDR}}/v1/sys/health"
placeholders:
VAULT_ADDR: "https://vault.example.com:8200"
- runtime: Any
description: "Verify token validity"
command: "vault token lookup"
- runtime: Kubernetes
description: "Check Kubernetes auth configuration"
command: "kubectl get serviceaccount -n {{NAMESPACE}} stellaops-vault-auth"
placeholders:
NAMESPACE: "stellaops"
verificationCommand: "vault token lookup"
```
---
## 5. Placeholder Resolution
### 5.1 Placeholder Sources
Placeholders in commands are resolved from:
| Source | Priority | Example |
|--------|----------|---------|
| User input | 1 (highest) | `{{HOST}}` from form field |
| Environment | 2 | `{{VAULT_ADDR}}` from env |
| Detection | 3 | `{{NAMESPACE}}` from context |
| Default | 4 (lowest) | Fallback value |
### 5.2 Placeholder Syntax
```
{{PLACEHOLDER_NAME}}
{{PLACEHOLDER_NAME:-default_value}}
```
Examples:
- `{{HOST}}` - Required placeholder
- `{{PORT:-5432}}` - Optional with default
- `{{COMPOSE_FILE:-docker-compose.yml}}` - File path default
### 5.3 Secret Redaction
Commands containing secrets are never displayed with actual values:
| Placeholder | Display | Actual |
|-------------|---------|--------|
| `{{PASSWORD}}` | `{{PASSWORD}}` | Never resolved in display |
| `{{TOKEN}}` | `{{TOKEN}}` | Never resolved in display |
| `{{SECRET_KEY}}` | `{{SECRET_KEY}}` | Never resolved in display |
The user must copy and manually substitute secrets.
---
## 6. Verification Flow
### 6.1 Post-Fix Verification
After the user applies a fix, the wizard:
1. **Wait** - Pause for user confirmation ("I've run this command")
2. **Verify** - Run the verification command
3. **Re-check** - Run the original Doctor check
4. **Report** - Show success or next steps
### 6.2 Verification Command Execution
```csharp
public interface IVerificationExecutor
{
Task<VerificationResult> ExecuteAsync(
string command,
TimeSpan timeout,
CancellationToken ct);
}
public sealed record VerificationResult
{
public required bool Success { get; init; }
public required int ExitCode { get; init; }
public required string Output { get; init; }
public required TimeSpan Duration { get; init; }
}
```
### 6.3 Re-Check Behavior
```
[FAIL] check.database.connectivity
Suggested fix applied. Verifying...
[RUN] pg_isready -h localhost -p 5432
localhost:5432 - accepting connections
Re-running check...
[PASS] check.database.connectivity
PostgreSQL connection successful
```
---
## 7. Check Aggregation
### 7.1 Step Completion Criteria
A step is complete when:
- All **Critical** checks pass
- No **Fail** status on any check
- User has acknowledged all **Warning** checks
### 7.2 Aggregated Status
```csharp
public enum StepValidationStatus
{
NotStarted, // No checks run
InProgress, // Checks running
Passed, // All critical pass, no failures
PassedWithWarns, // All critical pass, some warnings
Failed, // Any critical failure
Skipped // User explicitly skipped
}
```
### 7.3 Status Rollup for Thresholds
```
Operational Threshold:
[x] check.database.connectivity PASS
[x] check.database.permissions PASS
[x] check.database.migrations.applied PASS
[x] check.services.valkey.connectivity PASS
[x] check.auth.admin.exists PASS
[x] check.crypto.profile.valid PASS
Status: OPERATIONAL (6/6 required checks passed)
Production-Ready Threshold:
[x] check.security.identity.configured PASS
[x] check.integration.vault.connected PASS
[x] check.integration.scm.connected PASS
[x] check.notify.channel.configured PASS
[ ] check.orchestrator.agent.healthy SKIP
[ ] check.feeds.sync.enabled SKIP
Status: NOT PRODUCTION-READY (4/6 recommended, 2 skipped)
```
---
## 8. Doctor Engine Integration
### 8.1 Wizard-Specific Check Context
The wizard provides context to Doctor checks:
```csharp
public sealed record WizardCheckContext
{
public required string StepId { get; init; }
public required RuntimeEnvironment DetectedRuntime { get; init; }
public required ImmutableDictionary<string, string> UserInputs { get; init; }
public bool GenerateRemediations { get; init; } = true;
public bool IncludePlaceholders { get; init; } = true;
}
```
### 8.2 Check Invocation
```csharp
public interface IWizardDoctorClient
{
Task<ImmutableArray<CheckResult>> RunStepChecksAsync(
string stepId,
WizardCheckContext context,
CancellationToken ct);
Task<CheckResult> RunSingleCheckAsync(
string checkId,
WizardCheckContext context,
CancellationToken ct);
Task<VerificationResult> RunVerificationAsync(
string command,
WizardCheckContext context,
CancellationToken ct);
}
```
### 8.3 Check Timeout
| Check Category | Default Timeout | Max Timeout |
|----------------|-----------------|-------------|
| Connectivity | 10 seconds | 30 seconds |
| Authentication | 15 seconds | 60 seconds |
| Migrations | 60 seconds | 300 seconds |
| Full validation | 30 seconds | 120 seconds |
---
## 9. Remediation Safety
### 9.1 Dangerous Commands
Commands marked `isDangerous: true` require user confirmation:
```
WARNING: This command will modify your database schema.
Command:
stella migrations-run --module all
This action:
- Applies 5 pending migrations
- Cannot be automatically rolled back
- May take several minutes
Type 'apply' to confirm: _
```
### 9.2 Sudo Requirements
Commands requiring `sudo` show a notice:
```
This command requires administrator privileges.
Command:
sudo systemctl start postgresql
[Copy Command]
Note: You may be prompted for your password.
```
### 9.3 Secret Substitution Notice
```
This command contains placeholders for sensitive values.
Command:
vault write auth/approle/login role_id={{ROLE_ID}} secret_id={{SECRET_ID}}
Before running:
1. Replace {{ROLE_ID}} with your AppRole Role ID
2. Replace {{SECRET_ID}} with your AppRole Secret ID
[Copy Command]
```
---
## 10. Check Plugin Requirements
### 10.1 New Checks for Setup Wizard
The following checks may need to be added to existing plugins:
| Plugin | New Check ID | Purpose |
|--------|--------------|---------|
| Core | `check.auth.admin.exists` | Verify admin user exists |
| Core | `check.auth.password.policy` | Verify password complexity |
| Core | `check.crypto.signing.test` | Test signing operation |
| Database | `check.database.migrations.checksums` | Verify migration integrity |
| Integration | `check.integration.vault.secrets.access` | Test secret retrieval |
| Integration | `check.orchestrator.environment.valid` | Validate environment config |
| Notify | `check.notify.delivery.test` | Test notification delivery |
### 10.2 Check Implementation Contract
Each check must implement:
```csharp
public interface ISetupWizardAwareCheck : IDoctorCheck
{
// Standard check execution
Task<CheckResult> ExecuteAsync(CheckContext context, CancellationToken ct);
// Generate runtime-specific remediations
ImmutableArray<RemediationCommand> GetRemediations(
CheckResult result,
RuntimeEnvironment runtime);
// Verification command for this check
string? GetVerificationCommand(RuntimeEnvironment runtime);
}
```
---
## 11. Audit Trail
### 11.1 Setup Event Logging
All wizard actions are logged to the Timeline service:
```csharp
public sealed record SetupWizardEvent
{
public required string EventType { get; init; } // step.started, step.completed, check.failed, etc.
public required string StepId { get; init; }
public required string? CheckId { get; init; }
public required CheckStatus? Status { get; init; }
public required DateTimeOffset OccurredAt { get; init; }
public required string? UserId { get; init; }
public ImmutableDictionary<string, string> Metadata { get; init; }
}
```
### 11.2 Event Types
| Event Type | Description |
|------------|-------------|
| `setup.started` | Wizard initiated |
| `setup.completed` | Wizard finished successfully |
| `setup.aborted` | Wizard cancelled |
| `step.started` | Step configuration began |
| `step.completed` | Step passed all checks |
| `step.failed` | Step failed validation |
| `step.skipped` | User skipped optional step |
| `check.passed` | Individual check passed |
| `check.failed` | Individual check failed |
| `check.warned` | Individual check warned |
| `remediation.copied` | User copied fix command |
| `remediation.verified` | Fix verification succeeded |