Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
- Implemented MigrationCategoryTests to validate migration categorization for startup, release, seed, and data migrations. - Added tests for edge cases, including null, empty, and whitespace migration names. - Created StartupMigrationHostTests to verify the behavior of the migration host with real PostgreSQL instances using Testcontainers. - Included tests for migration execution, schema creation, and handling of pending release migrations. - Added SQL migration files for testing: creating a test table, adding a column, a release migration, and seeding data.
499 lines
19 KiB
Markdown
499 lines
19 KiB
Markdown
# PostgreSQL Migration Strategy
|
|
|
|
**Version:** 1.0
|
|
**Last Updated:** 2025-12-03
|
|
**Status:** Active
|
|
|
|
## Overview
|
|
|
|
This document defines the migration strategy for StellaOps PostgreSQL databases. It covers initial setup, per-release migrations, multi-instance coordination, and air-gapped operation.
|
|
|
|
## Principles
|
|
|
|
1. **Forward-Only**: No down migrations. Fixes are applied as new forward migrations.
|
|
2. **Idempotent**: All migrations must be safe to re-run (use `IF NOT EXISTS`, `ON CONFLICT DO NOTHING`).
|
|
3. **Deterministic**: Same input produces identical schema state across environments.
|
|
4. **Air-Gap Compatible**: All migrations embedded in assemblies, no external dependencies.
|
|
5. **Zero-Downtime**: Non-breaking migrations run at startup; breaking changes require coordination.
|
|
|
|
## Migration Categories
|
|
|
|
### Category A: Startup Migrations (Automatic)
|
|
|
|
Run automatically when application starts. Must complete within 60 seconds.
|
|
|
|
**Allowed Operations:**
|
|
- `CREATE SCHEMA IF NOT EXISTS`
|
|
- `CREATE TABLE IF NOT EXISTS`
|
|
- `CREATE INDEX IF NOT EXISTS`
|
|
- `CREATE INDEX CONCURRENTLY` (non-blocking)
|
|
- `ALTER TABLE ADD COLUMN` (nullable or with default)
|
|
- `CREATE TYPE ... IF NOT EXISTS` (enums)
|
|
- Adding new enum values (`ALTER TYPE ... ADD VALUE IF NOT EXISTS`)
|
|
- Insert seed data with `ON CONFLICT DO NOTHING`
|
|
|
|
**Forbidden Operations:**
|
|
- `DROP TABLE/COLUMN/INDEX`
|
|
- `ALTER TABLE DROP COLUMN`
|
|
- `ALTER TABLE ALTER COLUMN TYPE`
|
|
- `TRUNCATE`
|
|
- Large data migrations (> 10,000 rows affected)
|
|
- Any operation requiring `ACCESS EXCLUSIVE` lock for extended periods
|
|
|
|
### Category B: Release Migrations (Manual/CLI)
|
|
|
|
Require explicit execution via CLI before deployment. Used for breaking changes.
|
|
|
|
**Typical Operations:**
|
|
- Dropping deprecated columns/tables
|
|
- Column type changes
|
|
- Large data backfills
|
|
- Index rebuilds
|
|
- Table renames
|
|
- Constraint modifications
|
|
|
|
### Category C: Data Migrations (Batched)
|
|
|
|
Long-running data transformations that run as background jobs.
|
|
|
|
**Characteristics:**
|
|
- Batched processing (1000-10000 rows per batch)
|
|
- Resumable after interruption
|
|
- Progress tracking
|
|
- Can run alongside application
|
|
|
|
## Migration File Structure
|
|
|
|
```
|
|
src/<Module>/__Libraries/StellaOps.<Module>.Storage.Postgres/
|
|
├── Migrations/
|
|
│ ├── 001_initial_schema.sql # Category A
|
|
│ ├── 002_add_audit_columns.sql # Category A
|
|
│ ├── 003_add_search_index.sql # Category A
|
|
│ └── 100_drop_legacy_columns.sql # Category B (100+ = manual)
|
|
├── Seeds/
|
|
│ ├── 001_default_roles.sql # Seed data
|
|
│ └── 002_builtin_policies.sql # Seed data
|
|
└── DataMigrations/
|
|
└── DM001_BackfillTenantIds.cs # Category C (code-based)
|
|
```
|
|
|
|
### Naming Convention
|
|
|
|
| Prefix | Category | Description |
|
|
|--------|----------|-------------|
|
|
| `001-099` | A (Startup) | Automatic, non-breaking |
|
|
| `100-199` | B (Release) | Manual, breaking changes |
|
|
| `200-299` | B (Release) | Major version migrations |
|
|
| `S001-S999` | Seed | Reference data |
|
|
| `DM001-DM999` | C (Data) | Batched data migrations |
|
|
|
|
## Execution Flow
|
|
|
|
### Application Startup
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Application Startup │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 1. Acquire Advisory Lock (pg_try_advisory_lock) │
|
|
│ Key: hash of schema name │
|
|
│ If lock fails: wait up to 120s, then fail startup │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 2. Create schema_migrations table if not exists │
|
|
│ Columns: migration_name, applied_at, checksum, category │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 3. Load embedded migrations (001-099 only) │
|
|
│ - Sort by name │
|
|
│ - Compute checksums │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 4. Compare with applied migrations │
|
|
│ - Detect checksum mismatches (FATAL ERROR) │
|
|
│ - Identify pending migrations │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 5. Check for pending Category B migrations │
|
|
│ - If any 100+ migrations are pending: FAIL STARTUP │
|
|
│ - Log: "Run 'stellaops migrate' before deployment" │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 6. Execute pending Category A migrations │
|
|
│ - Each in transaction │
|
|
│ - Record in schema_migrations │
|
|
│ - Log timing │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 7. Execute seed data (if not already applied) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 8. Release Advisory Lock │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 9. Continue Application Startup │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Release Migration (CLI)
|
|
|
|
```bash
|
|
# Before deployment - run breaking migrations
|
|
stellaops system migrations-run --module Authority --category release
|
|
|
|
# Verify migration state
|
|
stellaops system migrations-status --module Authority
|
|
|
|
# Dry run (show what would be executed)
|
|
stellaops system migrations-run --module Authority --dry-run
|
|
```
|
|
|
|
## Multi-Instance Coordination
|
|
|
|
### Advisory Locks
|
|
|
|
Each module uses a unique advisory lock key derived from its schema name:
|
|
|
|
```sql
|
|
-- Lock key calculation
|
|
SELECT pg_try_advisory_lock(hashtext('auth')); -- Authority
|
|
SELECT pg_try_advisory_lock(hashtext('scheduler')); -- Scheduler
|
|
SELECT pg_try_advisory_lock(hashtext('vuln')); -- Concelier
|
|
SELECT pg_try_advisory_lock(hashtext('policy')); -- Policy
|
|
SELECT pg_try_advisory_lock(hashtext('notify')); -- Notify
|
|
```
|
|
|
|
### Race Condition Handling
|
|
|
|
```
|
|
Instance A Instance B
|
|
│ │
|
|
├─ Acquire lock (success) ──► │
|
|
│ ├─ Acquire lock (BLOCKED)
|
|
├─ Run migrations │ Wait up to 120s
|
|
│ │
|
|
├─ Release lock ────────────► │
|
|
│ ├─ Acquire lock (success)
|
|
│ ├─ Check migrations (none pending)
|
|
│ ├─ Release lock
|
|
│ │
|
|
▼ ▼
|
|
Running Running
|
|
```
|
|
|
|
## Schema Migrations Table
|
|
|
|
Each schema maintains its own migration history:
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS {schema}.schema_migrations (
|
|
migration_name TEXT PRIMARY KEY,
|
|
category TEXT NOT NULL DEFAULT 'startup',
|
|
checksum TEXT NOT NULL,
|
|
applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
applied_by TEXT,
|
|
duration_ms INT,
|
|
|
|
CONSTRAINT valid_category CHECK (category IN ('startup', 'release', 'seed', 'data'))
|
|
);
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_schema_migrations_applied_at
|
|
ON {schema}.schema_migrations(applied_at DESC);
|
|
```
|
|
|
|
## Module-Specific Schemas
|
|
|
|
| Module | Schema | Lock Key | Tables |
|
|
|--------|--------|----------|--------|
|
|
| Authority | `auth` | `hashtext('auth')` | tenants, users, roles, tokens, sessions |
|
|
| Scheduler | `scheduler` | `hashtext('scheduler')` | jobs, triggers, workers, locks |
|
|
| Concelier | `vuln` | `hashtext('vuln')` | advisories, affected, aliases, sources |
|
|
| Policy | `policy` | `hashtext('policy')` | packs, versions, rules, evaluations |
|
|
| Notify | `notify` | `hashtext('notify')` | templates, channels, deliveries |
|
|
| Excititor | `vex` | `hashtext('vex')` | statements, documents, products |
|
|
|
|
## Release Workflow
|
|
|
|
### Pre-Deployment
|
|
|
|
```bash
|
|
# 1. Review pending migrations
|
|
stellaops system migrations-status --module all
|
|
|
|
# 2. Backup database (if required)
|
|
pg_dump -Fc stellaops > backup_$(date +%Y%m%d).dump
|
|
|
|
# 3. Run release migrations in maintenance window
|
|
stellaops system migrations-run --category release --module all
|
|
|
|
# 4. Verify schema state
|
|
stellaops system migrations-verify --module all
|
|
```
|
|
|
|
### Deployment
|
|
|
|
1. Deploy new application version
|
|
2. Application startup runs Category A migrations automatically
|
|
3. Health checks pass after migrations complete
|
|
|
|
### Post-Deployment
|
|
|
|
```bash
|
|
# Check migration status
|
|
stellaops system migrations-status --module all
|
|
|
|
# Run any data migrations (background)
|
|
stellaops system migrations-run --category data --module all
|
|
```
|
|
|
|
## Rollback Strategy
|
|
|
|
Since we use forward-only migrations, rollback is achieved through:
|
|
|
|
1. **Fix-Forward**: Deploy a new migration that reverses the problematic change
|
|
2. **Blue/Green Deployment**: Switch back to previous version (requires backward-compatible migrations)
|
|
3. **Point-in-Time Recovery**: Restore from backup (last resort)
|
|
|
|
### Backward Compatibility Window
|
|
|
|
For zero-downtime deployments, migrations must be backward compatible for N-1 version:
|
|
|
|
```
|
|
Version N: Adds new nullable column 'status_v2'
|
|
Version N+1: Application uses 'status_v2', keeps 'status' populated
|
|
Version N+2: Migration removes 'status' column (Category B)
|
|
```
|
|
|
|
## Air-Gapped Operation
|
|
|
|
All migrations are embedded as assembly resources:
|
|
|
|
```xml
|
|
<!-- In .csproj file -->
|
|
<ItemGroup>
|
|
<EmbeddedResource Include="Migrations\*.sql" LogicalName="%(Filename)%(Extension)" />
|
|
<EmbeddedResource Include="Seeds\*.sql" LogicalName="%(Filename)%(Extension)" />
|
|
</ItemGroup>
|
|
```
|
|
|
|
No network access required during migration execution.
|
|
|
|
## Monitoring & Observability
|
|
|
|
### Metrics
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `stellaops_migration_duration_seconds` | Histogram | Time to run migration |
|
|
| `stellaops_migration_pending_count` | Gauge | Number of pending migrations |
|
|
| `stellaops_migration_applied_total` | Counter | Total migrations applied |
|
|
| `stellaops_migration_failed_total` | Counter | Total migration failures |
|
|
|
|
### Logging
|
|
|
|
```
|
|
[INF] Migration: Acquiring lock for schema 'auth'
|
|
[INF] Migration: Lock acquired, checking pending migrations
|
|
[INF] Migration: 2 pending migrations found
|
|
[INF] Migration: Applying 003_add_audit_columns.sql (checksum: a1b2c3...)
|
|
[INF] Migration: 003_add_audit_columns.sql completed in 245ms
|
|
[INF] Migration: Applying 004_add_search_index.sql (checksum: d4e5f6...)
|
|
[INF] Migration: 004_add_search_index.sql completed in 1823ms
|
|
[INF] Migration: All migrations applied, releasing lock
|
|
```
|
|
|
|
### Alerts
|
|
|
|
- Migration lock held > 5 minutes
|
|
- Migration failure
|
|
- Checksum mismatch detected
|
|
- Pending Category B migrations blocking startup
|
|
|
|
## Development Workflow
|
|
|
|
### Creating a New Migration
|
|
|
|
```bash
|
|
# 1. Create migration file
|
|
touch src/Authority/__Libraries/StellaOps.Authority.Storage.Postgres/Migrations/005_add_mfa_columns.sql
|
|
|
|
# 2. Write idempotent SQL
|
|
cat > 005_add_mfa_columns.sql << 'EOF'
|
|
-- Migration: 005_add_mfa_columns
|
|
-- Category: startup
|
|
-- Description: Add MFA support columns to users table
|
|
|
|
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_enabled BOOLEAN NOT NULL DEFAULT FALSE;
|
|
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_secret TEXT;
|
|
ALTER TABLE auth.users ADD COLUMN IF NOT EXISTS mfa_backup_codes TEXT[];
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_users_mfa_enabled ON auth.users(mfa_enabled) WHERE mfa_enabled = TRUE;
|
|
EOF
|
|
|
|
# 3. Test locally
|
|
dotnet run --project src/Authority/StellaOps.Authority.WebService
|
|
|
|
# 4. Verify migration applied
|
|
stellaops system migrations-status --module Authority
|
|
```
|
|
|
|
### Testing Migrations
|
|
|
|
```bash
|
|
# Run integration tests with migrations
|
|
dotnet test --filter "Category=Migration"
|
|
|
|
# Test idempotency (run twice)
|
|
stellaops system migrations-run --module Authority
|
|
stellaops system migrations-run --module Authority # Should be no-op
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Lock Timeout
|
|
|
|
```
|
|
ERROR: Could not acquire migration lock within 120 seconds
|
|
```
|
|
|
|
**Cause**: Another instance is running migrations or crashed while holding lock.
|
|
|
|
**Resolution**:
|
|
```sql
|
|
-- Check active locks
|
|
SELECT * FROM pg_locks WHERE locktype = 'advisory';
|
|
|
|
-- Force release (use with caution)
|
|
SELECT pg_advisory_unlock_all();
|
|
```
|
|
|
|
### Checksum Mismatch
|
|
|
|
```
|
|
ERROR: Migration checksum mismatch for '003_add_audit_columns.sql'
|
|
Expected: a1b2c3d4e5f6...
|
|
Found: x9y8z7w6v5u4...
|
|
```
|
|
|
|
**Cause**: Migration file was modified after being applied.
|
|
|
|
**Resolution**:
|
|
1. Never modify applied migrations
|
|
2. If intentional, update checksum manually in `schema_migrations`
|
|
3. Create new migration with fix instead
|
|
|
|
### Pending Release Migrations
|
|
|
|
```
|
|
ERROR: Cannot start application - pending release migrations require manual execution
|
|
Pending: 100_drop_legacy_columns.sql
|
|
Run: stellaops system migrations-run --module Authority --category release
|
|
```
|
|
|
|
**Resolution**: Run CLI migration command before deployment.
|
|
|
|
## Integration Guide
|
|
|
|
### Adding Startup Migrations to a Module
|
|
|
|
```csharp
|
|
// In Program.cs or Startup.cs
|
|
using StellaOps.Infrastructure.Postgres.Migrations;
|
|
|
|
// Option 1: Using PostgresOptions
|
|
services.AddStartupMigrations(
|
|
schemaName: "auth",
|
|
moduleName: "Authority",
|
|
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
|
|
configureOptions: options =>
|
|
{
|
|
options.LockTimeoutSeconds = 120;
|
|
options.FailOnPendingReleaseMigrations = true;
|
|
});
|
|
|
|
// Option 2: Using custom options type
|
|
services.AddStartupMigrations<AuthorityOptions>(
|
|
schemaName: "auth",
|
|
moduleName: "Authority",
|
|
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
|
|
connectionStringSelector: opts => opts.Storage.ConnectionString);
|
|
|
|
// Add migration status service for health checks
|
|
services.AddMigrationStatus<PostgresOptions>(
|
|
schemaName: "auth",
|
|
moduleName: "Authority",
|
|
migrationsAssembly: typeof(AuthorityDataSource).Assembly,
|
|
connectionStringSelector: opts => opts.ConnectionString);
|
|
```
|
|
|
|
### Embedding Migrations in Assembly
|
|
|
|
```xml
|
|
<!-- In .csproj file -->
|
|
<ItemGroup>
|
|
<EmbeddedResource Include="Migrations\*.sql" LogicalName="%(Filename)%(Extension)" />
|
|
<EmbeddedResource Include="Seeds\*.sql" LogicalName="%(Filename)%(Extension)" />
|
|
</ItemGroup>
|
|
```
|
|
|
|
### Health Check Integration
|
|
|
|
```csharp
|
|
// Add migration status to health checks
|
|
services.AddHealthChecks()
|
|
.AddCheck("migrations", async (cancellationToken) =>
|
|
{
|
|
var status = await migrationStatusService.GetStatusAsync(cancellationToken);
|
|
|
|
if (status.HasBlockingIssues)
|
|
{
|
|
return HealthCheckResult.Unhealthy(
|
|
$"Pending release migrations: {status.PendingReleaseCount}, " +
|
|
$"Checksum errors: {status.ChecksumErrors.Count}");
|
|
}
|
|
|
|
if (status.PendingStartupCount > 0)
|
|
{
|
|
return HealthCheckResult.Degraded(
|
|
$"Pending startup migrations: {status.PendingStartupCount}");
|
|
}
|
|
|
|
return HealthCheckResult.Healthy($"Applied: {status.AppliedCount}");
|
|
});
|
|
```
|
|
|
|
## Implementation Files
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationRunner.cs` | Core migration execution logic |
|
|
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationCategory.cs` | Migration category enum and helpers |
|
|
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/StartupMigrationHost.cs` | IHostedService for automatic migrations |
|
|
| `src/__Libraries/StellaOps.Infrastructure.Postgres/Migrations/MigrationServiceExtensions.cs` | DI registration extensions |
|
|
|
|
## Reference
|
|
|
|
- [PostgreSQL Advisory Locks](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS)
|
|
- [Zero-Downtime Migrations](https://docs.stellaops.org/operations/migrations)
|
|
- [StellaOps CLI Reference](../09_API_CLI_REFERENCE.md)
|