- Implemented MigrationCategoryTests to validate migration categorization for startup, release, seed, and data migrations. - Added tests for edge cases, including null, empty, and whitespace migration names. - Created StartupMigrationHostTests to verify the behavior of the migration host with real PostgreSQL instances using Testcontainers. - Included tests for migration execution, schema creation, and handling of pending release migrations. - Added SQL migration files for testing: creating a test table, adding a column, a release migration, and seeding data.
19 KiB
PostgreSQL Migration Strategy
Version: 1.0 Last Updated: 2025-12-03 Status: Active
Overview
This document defines the migration strategy for StellaOps PostgreSQL databases. It covers initial setup, per-release migrations, multi-instance coordination, and air-gapped operation.
Principles
- Forward-Only: No down migrations. Fixes are applied as new forward migrations.
- Idempotent: All migrations must be safe to re-run (use
IF NOT EXISTS,ON CONFLICT DO NOTHING). - Deterministic: Same input produces identical schema state across environments.
- Air-Gap Compatible: All migrations embedded in assemblies, no external dependencies.
- Zero-Downtime: Non-breaking migrations run at startup; breaking changes require coordination.
Migration Categories
Category A: Startup Migrations (Automatic)
Run automatically when application starts. Must complete within 60 seconds.
Allowed Operations:
CREATE SCHEMA IF NOT EXISTSCREATE TABLE IF NOT EXISTSCREATE INDEX IF NOT EXISTSCREATE INDEX CONCURRENTLY(non-blocking)ALTER TABLE ADD COLUMN(nullable or with default)CREATE TYPE ... IF NOT EXISTS(enums)- Adding new enum values (
ALTER TYPE ... ADD VALUE IF NOT EXISTS) - Insert seed data with
ON CONFLICT DO NOTHING
Forbidden Operations:
DROP TABLE/COLUMN/INDEXALTER TABLE DROP COLUMNALTER TABLE ALTER COLUMN TYPETRUNCATE- Large data migrations (> 10,000 rows affected)
- Any operation requiring
ACCESS EXCLUSIVElock for extended periods
Category B: Release Migrations (Manual/CLI)
Require explicit execution via CLI before deployment. Used for breaking changes.
Typical Operations:
- Dropping deprecated columns/tables
- Column type changes
- Large data backfills
- Index rebuilds
- Table renames
- Constraint modifications
Category C: Data Migrations (Batched)
Long-running data transformations that run as background jobs.
Characteristics:
- Batched processing (1000-10000 rows per batch)
- Resumable after interruption
- Progress tracking
- Can run alongside application
Migration File Structure
src/<Module>/__Libraries/StellaOps.<Module>.Storage.Postgres/
├── Migrations/
│ ├── 001_initial_schema.sql # Category A
│ ├── 002_add_audit_columns.sql # Category A
│ ├── 003_add_search_index.sql # Category A
│ └── 100_drop_legacy_columns.sql # Category B (100+ = manual)
├── Seeds/
│ ├── 001_default_roles.sql # Seed data
│ └── 002_builtin_policies.sql # Seed data
└── DataMigrations/
└── DM001_BackfillTenantIds.cs # Category C (code-based)
Naming Convention
| Prefix | Category | Description |
|---|---|---|
001-099 |
A (Startup) | Automatic, non-breaking |
100-199 |
B (Release) | Manual, breaking changes |
200-299 |
B (Release) | Major version migrations |
S001-S999 |
Seed | Reference data |
DM001-DM999 |
C (Data) | Batched data migrations |
Execution Flow
Application Startup
┌─────────────────────────────────────────────────────────────┐
│ Application Startup │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 1. Acquire Advisory Lock (pg_try_advisory_lock) │
│ Key: hash of schema name │
│ If lock fails: wait up to 120s, then fail startup │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 2. Create schema_migrations table if not exists │
│ Columns: migration_name, applied_at, checksum, category │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 3. Load embedded migrations (001-099 only) │
│ - Sort by name │
│ - Compute checksums │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 4. Compare with applied migrations │
│ - Detect checksum mismatches (FATAL ERROR) │
│ - Identify pending migrations │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 5. Check for pending Category B migrations │
│ - If any 100+ migrations are pending: FAIL STARTUP │
│ - Log: "Run 'stellaops migrate' before deployment" │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 6. Execute pending Category A migrations │
│ - Each in transaction │
│ - Record in schema_migrations │
│ - Log timing │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 7. Execute seed data (if not already applied) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 8. Release Advisory Lock │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 9. Continue Application Startup │
└─────────────────────────────────────────────────────────────┘
Release Migration (CLI)
# Before deployment - run breaking migrations
stellaops system migrations-run --module Authority --category release
# Verify migration state
stellaops system migrations-status --module Authority
# Dry run (show what would be executed)
stellaops system migrations-run --module Authority --dry-run
Multi-Instance Coordination
Advisory Locks
Each module uses a unique advisory lock key derived from its schema name:
-- Lock key calculation
SELECT pg_try_advisory_lock(hashtext('auth')); -- Authority
SELECT pg_try_advisory_lock(hashtext('scheduler')); -- Scheduler
SELECT pg_try_advisory_lock(hashtext('vuln')); -- Concelier
SELECT pg_try_advisory_lock(hashtext('policy')); -- Policy
SELECT pg_try_advisory_lock(hashtext('notify')); -- Notify
Race Condition Handling
Instance A Instance B
│ │
├─ Acquire lock (success) ──► │
│ ├─ Acquire lock (BLOCKED)
├─ Run migrations │ Wait up to 120s
│ │
├─ Release lock ────────────► │
│ ├─ Acquire lock (success)
│ ├─ Check migrations (none pending)
│ ├─ Release lock
│ │
▼ ▼
Running Running
Schema Migrations Table
Each schema maintains its own migration history:
CREATE TABLE IF NOT EXISTS {schema}.schema_migrations (
migration_name TEXT PRIMARY KEY,
category TEXT NOT NULL DEFAULT 'startup',
checksum TEXT NOT NULL,
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
applied_by TEXT,
duration_ms INT,
CONSTRAINT valid_category CHECK (category IN ('startup', 'release', 'seed', 'data'))
);
CREATE INDEX IF NOT EXISTS idx_schema_migrations_applied_at
ON {schema}.schema_migrations(applied_at DESC);
Module-Specific Schemas
| Module | Schema | Lock Key | Tables |
|---|---|---|---|
| Authority | auth |
hashtext('auth') |
tenants, users, roles, tokens, sessions |
| Scheduler | scheduler |
hashtext('scheduler') |
jobs, triggers, workers, locks |
| Concelier | vuln |
hashtext('vuln') |
advisories, affected, aliases, sources |
| Policy | policy |
hashtext('policy') |
packs, versions, rules, evaluations |
| Notify | notify |
hashtext('notify') |
templates, channels, deliveries |
| Excititor | vex |
hashtext('vex') |
statements, documents, products |
Release Workflow
Pre-Deployment
# 1. Review pending migrations
stellaops system migrations-status --module all
# 2. Backup database (if required)
pg_dump -Fc stellaops > backup_$(date +%Y%m%d).dump
# 3. Run release migrations in maintenance window
stellaops system migrations-run --category release --module all
# 4. Verify schema state
stellaops system migrations-verify --module all
Deployment
- Deploy new application version
- Application startup runs Category A migrations automatically
- Health checks pass after migrations complete
Post-Deployment
# Check migration status
stellaops system migrations-status --module all
# Run any data migrations (background)
stellaops system migrations-run --category data --module all
Rollback Strategy
Since we use forward-only migrations, rollback is achieved through:
- Fix-Forward: Deploy a new migration that reverses the problematic change
- Blue/Green Deployment: Switch back to previous version (requires backward-compatible migrations)
- Point-in-Time Recovery: Restore from backup (last resort)
Backward Compatibility Window
For zero-downtime deployments, migrations must be backward compatible for N-1 version:
Version N: Adds new nullable column 'status_v2'
Version N+1: Application uses 'status_v2', keeps 'status' populated
Version N+2: Migration removes 'status' column (Category B)
Air-Gapped Operation
All migrations are embedded as assembly resources:
<!-- In .csproj file -->
<ItemGroup>
<EmbeddedResource Include="Migrations\*.sql" LogicalName="%(Filename)%(Extension)" />
<EmbeddedResource Include="Seeds\*.sql" LogicalName="%(Filename)%(Extension)" />
</ItemGroup>
No network access required during migration execution.
Monitoring & Observability
Metrics
| Metric | Type | Description |
|---|---|---|
stellaops_migration_duration_seconds |
Histogram | Time to run migration |
stellaops_migration_pending_count |
Gauge | Number of pending migrations |
stellaops_migration_applied_total |
Counter | Total migrations applied |
stellaops_migration_failed_total |
Counter | Total migration failures |
Logging
[INF] Migration: Acquiring lock for schema 'auth'
[INF] Migration: Lock acquired, checking pending migrations
[INF] Migration: 2 pending migrations found
[INF] Migration: Applying 003_add_audit_columns.sql (checksum: a1b2c3...)
[INF] Migration: 003_add_audit_columns.sql completed in 245ms
[INF] Migration: Applying 004_add_search_index.sql (checksum: d4e5f6...)
[INF] Migration: 004_add_search_index.sql completed in 1823ms
[INF] Migration: All migrations applied, releasing lock
Alerts
- Migration lock held > 5 minutes
- Migration failure
- Checksum mismatch detected
- Pending Category B migrations blocking startup
Development Workflow
Creating a New Migration
# 1. Create migration file
touch src/Authority/__Libraries/StellaOps.Authority.Storage.Postgres/Migrations/005_add_mfa_columns.sql
# 2. Write idempotent SQL
cat > 005_add_mfa_columns.sql << 'EOF'
-- Migration: 005_add_mfa_columns
-- Category: startup
-- Description: Add MFA support columns to users table
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_enabled BOOLEAN NOT NULL DEFAULT FALSE;
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_secret TEXT;
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_backup_codes TEXT[];
CREATE INDEX IF NOT EXISTS idx_users_mfa_enabled ON auth.users(mfa_enabled) WHERE mfa_enabled = TRUE;
EOF
# 3. Test locally
dotnet run --project src/Authority/StellaOps.Authority.WebService
# 4. Verify migration applied
stellaops system migrations-status --module Authority
Testing Migrations
# Run integration tests with migrations
dotnet test --filter "Category=Migration"
# Test idempotency (run twice)
stellaops system migrations-run --module Authority
stellaops system migrations-run --module Authority # Should be no-op
Troubleshooting
Lock Timeout
ERROR: Could not acquire migration lock within 120 seconds
Cause: Another instance is running migrations or crashed while holding lock.
Resolution:
-- Check active locks
SELECT * FROM pg_locks WHERE locktype = 'advisory';
-- Force release (use with caution)
SELECT pg_advisory_unlock_all();
Checksum Mismatch
ERROR: Migration checksum mismatch for '003_add_audit_columns.sql'
Expected: a1b2c3d4e5f6...
Found: x9y8z7w6v5u4...
Cause: Migration file was modified after being applied.
Resolution:
- Never modify applied migrations
- If intentional, update checksum manually in
schema_migrations - Create new migration with fix instead
Pending Release Migrations
ERROR: Cannot start application - pending release migrations require manual execution
Pending: 100_drop_legacy_columns.sql
Run: stellaops system migrations-run --module Authority --category release
Resolution: Run CLI migration command before deployment.
Integration Guide
Adding Startup Migrations to a Module
// In Program.cs or Startup.cs
using StellaOps.Infrastructure.Postgres.Migrations;
// Option 1: Using PostgresOptions
services.AddStartupMigrations(
schemaName: "auth",
moduleName: "Authority",
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
configureOptions: options =>
{
options.LockTimeoutSeconds = 120;
options.FailOnPendingReleaseMigrations = true;
});
// Option 2: Using custom options type
services.AddStartupMigrations<AuthorityOptions>(
schemaName: "auth",
moduleName: "Authority",
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
connectionStringSelector: opts => opts.Storage.ConnectionString);
// Add migration status service for health checks
services.AddMigrationStatus<PostgresOptions>(
schemaName: "auth",
moduleName: "Authority",
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
connectionStringSelector: opts => opts.ConnectionString);
Embedding Migrations in Assembly
<!-- In .csproj file -->
<ItemGroup>
<EmbeddedResource Include="Migrations\*.sql" LogicalName="%(Filename)%(Extension)" />
<EmbeddedResource Include="Seeds\*.sql" LogicalName="%(Filename)%(Extension)" />
</ItemGroup>
Health Check Integration
// Add migration status to health checks
services.AddHealthChecks()
.AddCheck("migrations", async (cancellationToken) =>
{
var status = await migrationStatusService.GetStatusAsync(cancellationToken);
if (status.HasBlockingIssues)
{
return HealthCheckResult.Unhealthy(
$"Pending release migrations: {status.PendingReleaseCount}, " +
$"Checksum errors: {status.ChecksumErrors.Count}");
}
if (status.PendingStartupCount > 0)
{
return HealthCheckResult.Degraded(
$"Pending startup migrations: {status.PendingStartupCount}");
}
return HealthCheckResult.Healthy($"Applied: {status.AppliedCount}");
});
Implementation Files
| File | Description |
|---|---|
src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs |
Core migration execution logic |
src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationCategory.cs |
Migration category enum and helpers |
src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/StartupMigrationHost.cs |
IHostedService for automatic migrations |
src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationServiceExtensions.cs |
DI registration extensions |