Add integration tests for migration categories and execution
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
- Implemented MigrationCategoryTests to validate migration categorization for startup, release, seed, and data migrations. - Added tests for edge cases, including null, empty, and whitespace migration names. - Created StartupMigrationHostTests to verify the behavior of the migration host with real PostgreSQL instances using Testcontainers. - Included tests for migration execution, schema creation, and handling of pending release migrations. - Added SQL migration files for testing: creating a test table, adding a column, a release migration, and seeding data.
This commit is contained in:
498
docs/db/MIGRATION_STRATEGY.md
Normal file
498
docs/db/MIGRATION_STRATEGY.md
Normal file
@@ -0,0 +1,498 @@
|
||||
# PostgreSQL Migration Strategy
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** 2025-12-03
|
||||
**Status:** Active
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines the migration strategy for StellaOps PostgreSQL databases. It covers initial setup, per-release migrations, multi-instance coordination, and air-gapped operation.
|
||||
|
||||
## Principles
|
||||
|
||||
1. **Forward-Only**: No down migrations. Fixes are applied as new forward migrations.
|
||||
2. **Idempotent**: All migrations must be safe to re-run (use `IF NOT EXISTS`, `ON CONFLICT DO NOTHING`).
|
||||
3. **Deterministic**: Same input produces identical schema state across environments.
|
||||
4. **Air-Gap Compatible**: All migrations embedded in assemblies, no external dependencies.
|
||||
5. **Zero-Downtime**: Non-breaking migrations run at startup; breaking changes require coordination.
|
||||
|
||||
## Migration Categories
|
||||
|
||||
### Category A: Startup Migrations (Automatic)
|
||||
|
||||
Run automatically when application starts. Must complete within 60 seconds.
|
||||
|
||||
**Allowed Operations:**
|
||||
- `CREATE SCHEMA IF NOT EXISTS`
|
||||
- `CREATE TABLE IF NOT EXISTS`
|
||||
- `CREATE INDEX IF NOT EXISTS`
|
||||
- `CREATE INDEX CONCURRENTLY` (non-blocking)
|
||||
- `ALTER TABLE ADD COLUMN` (nullable or with default)
|
||||
- `CREATE TYPE ... IF NOT EXISTS` (enums)
|
||||
- Adding new enum values (`ALTER TYPE ... ADD VALUE IF NOT EXISTS`)
|
||||
- Insert seed data with `ON CONFLICT DO NOTHING`
|
||||
|
||||
**Forbidden Operations:**
|
||||
- `DROP TABLE/COLUMN/INDEX`
|
||||
- `ALTER TABLE DROP COLUMN`
|
||||
- `ALTER TABLE ALTER COLUMN TYPE`
|
||||
- `TRUNCATE`
|
||||
- Large data migrations (> 10,000 rows affected)
|
||||
- Any operation requiring `ACCESS EXCLUSIVE` lock for extended periods
|
||||
|
||||
### Category B: Release Migrations (Manual/CLI)
|
||||
|
||||
Require explicit execution via CLI before deployment. Used for breaking changes.
|
||||
|
||||
**Typical Operations:**
|
||||
- Dropping deprecated columns/tables
|
||||
- Column type changes
|
||||
- Large data backfills
|
||||
- Index rebuilds
|
||||
- Table renames
|
||||
- Constraint modifications
|
||||
|
||||
### Category C: Data Migrations (Batched)
|
||||
|
||||
Long-running data transformations that run as background jobs.
|
||||
|
||||
**Characteristics:**
|
||||
- Batched processing (1000-10000 rows per batch)
|
||||
- Resumable after interruption
|
||||
- Progress tracking
|
||||
- Can run alongside application
|
||||
|
||||
## Migration File Structure
|
||||
|
||||
```
|
||||
src/<Module>/__Libraries/StellaOps.<Module>.Storage.Postgres/
|
||||
├── Migrations/
|
||||
│ ├── 001_initial_schema.sql # Category A
|
||||
│ ├── 002_add_audit_columns.sql # Category A
|
||||
│ ├── 003_add_search_index.sql # Category A
|
||||
│ └── 100_drop_legacy_columns.sql # Category B (100+ = manual)
|
||||
├── Seeds/
|
||||
│ ├── 001_default_roles.sql # Seed data
|
||||
│ └── 002_builtin_policies.sql # Seed data
|
||||
└── DataMigrations/
|
||||
└── DM001_BackfillTenantIds.cs # Category C (code-based)
|
||||
```
|
||||
|
||||
### Naming Convention
|
||||
|
||||
| Prefix | Category | Description |
|
||||
|--------|----------|-------------|
|
||||
| `001-099` | A (Startup) | Automatic, non-breaking |
|
||||
| `100-199` | B (Release) | Manual, breaking changes |
|
||||
| `200-299` | B (Release) | Major version migrations |
|
||||
| `S001-S999` | Seed | Reference data |
|
||||
| `DM001-DM999` | C (Data) | Batched data migrations |
|
||||
|
||||
## Execution Flow
|
||||
|
||||
### Application Startup
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Application Startup │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 1. Acquire Advisory Lock (pg_try_advisory_lock) │
|
||||
│ Key: hash of schema name │
|
||||
│ If lock fails: wait up to 120s, then fail startup │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 2. Create schema_migrations table if not exists │
|
||||
│ Columns: migration_name, applied_at, checksum, category │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 3. Load embedded migrations (001-099 only) │
|
||||
│ - Sort by name │
|
||||
│ - Compute checksums │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 4. Compare with applied migrations │
|
||||
│ - Detect checksum mismatches (FATAL ERROR) │
|
||||
│ - Identify pending migrations │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 5. Check for pending Category B migrations │
|
||||
│ - If any 100+ migrations are pending: FAIL STARTUP │
|
||||
│ - Log: "Run 'stellaops migrate' before deployment" │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 6. Execute pending Category A migrations │
|
||||
│ - Each in transaction │
|
||||
│ - Record in schema_migrations │
|
||||
│ - Log timing │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 7. Execute seed data (if not already applied) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 8. Release Advisory Lock │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 9. Continue Application Startup │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Release Migration (CLI)
|
||||
|
||||
```bash
|
||||
# Before deployment - run breaking migrations
|
||||
stellaops system migrations-run --module Authority --category release
|
||||
|
||||
# Verify migration state
|
||||
stellaops system migrations-status --module Authority
|
||||
|
||||
# Dry run (show what would be executed)
|
||||
stellaops system migrations-run --module Authority --dry-run
|
||||
```
|
||||
|
||||
## Multi-Instance Coordination
|
||||
|
||||
### Advisory Locks
|
||||
|
||||
Each module uses a unique advisory lock key derived from its schema name:
|
||||
|
||||
```sql
|
||||
-- Lock key calculation
|
||||
SELECT pg_try_advisory_lock(hashtext('auth')); -- Authority
|
||||
SELECT pg_try_advisory_lock(hashtext('scheduler')); -- Scheduler
|
||||
SELECT pg_try_advisory_lock(hashtext('vuln')); -- Concelier
|
||||
SELECT pg_try_advisory_lock(hashtext('policy')); -- Policy
|
||||
SELECT pg_try_advisory_lock(hashtext('notify')); -- Notify
|
||||
```
|
||||
|
||||
### Race Condition Handling
|
||||
|
||||
```
|
||||
Instance A Instance B
|
||||
│ │
|
||||
├─ Acquire lock (success) ──► │
|
||||
│ ├─ Acquire lock (BLOCKED)
|
||||
├─ Run migrations │ Wait up to 120s
|
||||
│ │
|
||||
├─ Release lock ────────────► │
|
||||
│ ├─ Acquire lock (success)
|
||||
│ ├─ Check migrations (none pending)
|
||||
│ ├─ Release lock
|
||||
│ │
|
||||
▼ ▼
|
||||
Running Running
|
||||
```
|
||||
|
||||
## Schema Migrations Table
|
||||
|
||||
Each schema maintains its own migration history:
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS {schema}.schema_migrations (
|
||||
migration_name TEXT PRIMARY KEY,
|
||||
category TEXT NOT NULL DEFAULT 'startup',
|
||||
checksum TEXT NOT NULL,
|
||||
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
applied_by TEXT,
|
||||
duration_ms INT,
|
||||
|
||||
CONSTRAINT valid_category CHECK (category IN ('startup', 'release', 'seed', 'data'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_schema_migrations_applied_at
|
||||
ON {schema}.schema_migrations(applied_at DESC);
|
||||
```
|
||||
|
||||
## Module-Specific Schemas
|
||||
|
||||
| Module | Schema | Lock Key | Tables |
|
||||
|--------|--------|----------|--------|
|
||||
| Authority | `auth` | `hashtext('auth')` | tenants, users, roles, tokens, sessions |
|
||||
| Scheduler | `scheduler` | `hashtext('scheduler')` | jobs, triggers, workers, locks |
|
||||
| Concelier | `vuln` | `hashtext('vuln')` | advisories, affected, aliases, sources |
|
||||
| Policy | `policy` | `hashtext('policy')` | packs, versions, rules, evaluations |
|
||||
| Notify | `notify` | `hashtext('notify')` | templates, channels, deliveries |
|
||||
| Excititor | `vex` | `hashtext('vex')` | statements, documents, products |
|
||||
|
||||
## Release Workflow
|
||||
|
||||
### Pre-Deployment
|
||||
|
||||
```bash
|
||||
# 1. Review pending migrations
|
||||
stellaops system migrations-status --module all
|
||||
|
||||
# 2. Backup database (if required)
|
||||
pg_dump -Fc stellaops > backup_$(date +%Y%m%d).dump
|
||||
|
||||
# 3. Run release migrations in maintenance window
|
||||
stellaops system migrations-run --category release --module all
|
||||
|
||||
# 4. Verify schema state
|
||||
stellaops system migrations-verify --module all
|
||||
```
|
||||
|
||||
### Deployment
|
||||
|
||||
1. Deploy new application version
|
||||
2. Application startup runs Category A migrations automatically
|
||||
3. Health checks pass after migrations complete
|
||||
|
||||
### Post-Deployment
|
||||
|
||||
```bash
|
||||
# Check migration status
|
||||
stellaops system migrations-status --module all
|
||||
|
||||
# Run any data migrations (background)
|
||||
stellaops system migrations-run --category data --module all
|
||||
```
|
||||
|
||||
## Rollback Strategy
|
||||
|
||||
Since we use forward-only migrations, rollback is achieved through:
|
||||
|
||||
1. **Fix-Forward**: Deploy a new migration that reverses the problematic change
|
||||
2. **Blue/Green Deployment**: Switch back to previous version (requires backward-compatible migrations)
|
||||
3. **Point-in-Time Recovery**: Restore from backup (last resort)
|
||||
|
||||
### Backward Compatibility Window
|
||||
|
||||
For zero-downtime deployments, migrations must be backward compatible for N-1 version:
|
||||
|
||||
```
|
||||
Version N: Adds new nullable column 'status_v2'
|
||||
Version N+1: Application uses 'status_v2', keeps 'status' populated
|
||||
Version N+2: Migration removes 'status' column (Category B)
|
||||
```
|
||||
|
||||
## Air-Gapped Operation
|
||||
|
||||
All migrations are embedded as assembly resources:
|
||||
|
||||
```xml
|
||||
<!-- In .csproj file -->
|
||||
<ItemGroup>
|
||||
<EmbeddedResource Include="Migrations\*.sql" LogicalName="%(Filename)%(Extension)" />
|
||||
<EmbeddedResource Include="Seeds\*.sql" LogicalName="%(Filename)%(Extension)" />
|
||||
</ItemGroup>
|
||||
```
|
||||
|
||||
No network access required during migration execution.
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
### Metrics
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `stellaops_migration_duration_seconds` | Histogram | Time to run migration |
|
||||
| `stellaops_migration_pending_count` | Gauge | Number of pending migrations |
|
||||
| `stellaops_migration_applied_total` | Counter | Total migrations applied |
|
||||
| `stellaops_migration_failed_total` | Counter | Total migration failures |
|
||||
|
||||
### Logging
|
||||
|
||||
```
|
||||
[INF] Migration: Acquiring lock for schema 'auth'
|
||||
[INF] Migration: Lock acquired, checking pending migrations
|
||||
[INF] Migration: 2 pending migrations found
|
||||
[INF] Migration: Applying 003_add_audit_columns.sql (checksum: a1b2c3...)
|
||||
[INF] Migration: 003_add_audit_columns.sql completed in 245ms
|
||||
[INF] Migration: Applying 004_add_search_index.sql (checksum: d4e5f6...)
|
||||
[INF] Migration: 004_add_search_index.sql completed in 1823ms
|
||||
[INF] Migration: All migrations applied, releasing lock
|
||||
```
|
||||
|
||||
### Alerts
|
||||
|
||||
- Migration lock held > 5 minutes
|
||||
- Migration failure
|
||||
- Checksum mismatch detected
|
||||
- Pending Category B migrations blocking startup
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Creating a New Migration
|
||||
|
||||
```bash
|
||||
# 1. Create migration file
|
||||
touch src/Authority/__Libraries/StellaOps.Authority.Storage.Postgres/Migrations/005_add_mfa_columns.sql
|
||||
|
||||
# 2. Write idempotent SQL
|
||||
cat > 005_add_mfa_columns.sql << 'EOF'
|
||||
-- Migration: 005_add_mfa_columns
|
||||
-- Category: startup
|
||||
-- Description: Add MFA support columns to users table
|
||||
|
||||
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_enabled BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_secret TEXT;
|
||||
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_backup_codes TEXT[];
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_users_mfa_enabled ON auth.users(mfa_enabled) WHERE mfa_enabled = TRUE;
|
||||
EOF
|
||||
|
||||
# 3. Test locally
|
||||
dotnet run --project src/Authority/StellaOps.Authority.WebService
|
||||
|
||||
# 4. Verify migration applied
|
||||
stellaops system migrations-status --module Authority
|
||||
```
|
||||
|
||||
### Testing Migrations
|
||||
|
||||
```bash
|
||||
# Run integration tests with migrations
|
||||
dotnet test --filter "Category=Migration"
|
||||
|
||||
# Test idempotency (run twice)
|
||||
stellaops system migrations-run --module Authority
|
||||
stellaops system migrations-run --module Authority # Should be no-op
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Lock Timeout
|
||||
|
||||
```
|
||||
ERROR: Could not acquire migration lock within 120 seconds
|
||||
```
|
||||
|
||||
**Cause**: Another instance is running migrations or crashed while holding lock.
|
||||
|
||||
**Resolution**:
|
||||
```sql
|
||||
-- Check active locks
|
||||
SELECT * FROM pg_locks WHERE locktype = 'advisory';
|
||||
|
||||
-- Force release (use with caution)
|
||||
SELECT pg_advisory_unlock_all();
|
||||
```
|
||||
|
||||
### Checksum Mismatch
|
||||
|
||||
```
|
||||
ERROR: Migration checksum mismatch for '003_add_audit_columns.sql'
|
||||
Expected: a1b2c3d4e5f6...
|
||||
Found: x9y8z7w6v5u4...
|
||||
```
|
||||
|
||||
**Cause**: Migration file was modified after being applied.
|
||||
|
||||
**Resolution**:
|
||||
1. Never modify applied migrations
|
||||
2. If intentional, update checksum manually in `schema_migrations`
|
||||
3. Create new migration with fix instead
|
||||
|
||||
### Pending Release Migrations
|
||||
|
||||
```
|
||||
ERROR: Cannot start application - pending release migrations require manual execution
|
||||
Pending: 100_drop_legacy_columns.sql
|
||||
Run: stellaops system migrations-run --module Authority --category release
|
||||
```
|
||||
|
||||
**Resolution**: Run CLI migration command before deployment.
|
||||
|
||||
## Integration Guide
|
||||
|
||||
### Adding Startup Migrations to a Module
|
||||
|
||||
```csharp
|
||||
// In Program.cs or Startup.cs
|
||||
using StellaOps.Infrastructure.Postgres.Migrations;
|
||||
|
||||
// Option 1: Using PostgresOptions
|
||||
services.AddStartupMigrations(
|
||||
schemaName: "auth",
|
||||
moduleName: "Authority",
|
||||
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
|
||||
configureOptions: options =>
|
||||
{
|
||||
options.LockTimeoutSeconds = 120;
|
||||
options.FailOnPendingReleaseMigrations = true;
|
||||
});
|
||||
|
||||
// Option 2: Using custom options type
|
||||
services.AddStartupMigrations<AuthorityOptions>(
|
||||
schemaName: "auth",
|
||||
moduleName: "Authority",
|
||||
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
|
||||
connectionStringSelector: opts => opts.Storage.ConnectionString);
|
||||
|
||||
// Add migration status service for health checks
|
||||
services.AddMigrationStatus<PostgresOptions>(
|
||||
schemaName: "auth",
|
||||
moduleName: "Authority",
|
||||
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
|
||||
connectionStringSelector: opts => opts.ConnectionString);
|
||||
```
|
||||
|
||||
### Embedding Migrations in Assembly
|
||||
|
||||
```xml
|
||||
<!-- In .csproj file -->
|
||||
<ItemGroup>
|
||||
<EmbeddedResource Include="Migrations\*.sql" LogicalName="%(Filename)%(Extension)" />
|
||||
<EmbeddedResource Include="Seeds\*.sql" LogicalName="%(Filename)%(Extension)" />
|
||||
</ItemGroup>
|
||||
```
|
||||
|
||||
### Health Check Integration
|
||||
|
||||
```csharp
|
||||
// Add migration status to health checks
|
||||
services.AddHealthChecks()
|
||||
.AddCheck("migrations", async (cancellationToken) =>
|
||||
{
|
||||
var status = await migrationStatusService.GetStatusAsync(cancellationToken);
|
||||
|
||||
if (status.HasBlockingIssues)
|
||||
{
|
||||
return HealthCheckResult.Unhealthy(
|
||||
$"Pending release migrations: {status.PendingReleaseCount}, " +
|
||||
$"Checksum errors: {status.ChecksumErrors.Count}");
|
||||
}
|
||||
|
||||
if (status.PendingStartupCount > 0)
|
||||
{
|
||||
return HealthCheckResult.Degraded(
|
||||
$"Pending startup migrations: {status.PendingStartupCount}");
|
||||
}
|
||||
|
||||
return HealthCheckResult.Healthy($"Applied: {status.AppliedCount}");
|
||||
});
|
||||
```
|
||||
|
||||
## Implementation Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs` | Core migration execution logic |
|
||||
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationCategory.cs` | Migration category enum and helpers |
|
||||
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/StartupMigrationHost.cs` | IHostedService for automatic migrations |
|
||||
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationServiceExtensions.cs` | DI registration extensions |
|
||||
|
||||
## Reference
|
||||
|
||||
- [PostgreSQL Advisory Locks](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS)
|
||||
- [Zero-Downtime Migrations](https://docs.stellaops.org/operations/migrations)
|
||||
- [StellaOps CLI Reference](../09_API_CLI_REFERENCE.md)
|
||||
Reference in New Issue
Block a user