7.4 KiB
Phase 2: Scheduler Module Conversion
Sprint: 3 Duration: 1 sprint Status: TODO Dependencies: Phase 0 (Foundations)
Objectives
- Create
StellaOps.Scheduler.Storage.Postgresproject - Implement Scheduler schema in PostgreSQL
- Implement 7+ repository interfaces
- Replace MongoDB job tracking with PostgreSQL
- Implement PostgreSQL advisory locks for distributed locking
Deliverables
| Deliverable | Acceptance Criteria |
|---|---|
| Scheduler schema | All tables created with indexes |
| Repository implementations | All 7+ interfaces implemented |
| Advisory locks | Distributed locking working |
| Integration tests | 100% coverage of CRUD operations |
| Verification report | Schedule execution verified |
Schema Reference
See SPECIFICATION.md Section 5.4 for complete Scheduler schema.
Tables:
scheduler.schedulesscheduler.triggersscheduler.runsscheduler.graph_jobsscheduler.policy_jobsscheduler.impact_snapshotsscheduler.workersscheduler.execution_logsscheduler.locksscheduler.run_summariesscheduler.audit
Task Breakdown
T2.1: Create Scheduler.Storage.Postgres Project
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.1.1: Create project structure
- T2.1.2: Add NuGet references
- T2.1.3: Create
SchedulerDataSourceclass - T2.1.4: Create
ServiceCollectionExtensions.cs
T2.2: Implement Schema Migrations
Status: TODO Assignee: TBD Estimate: 1 day
Subtasks:
- T2.2.1: Create
V001_CreateSchedulerSchemamigration - T2.2.2: Include all tables and indexes
- T2.2.3: Add partial index for active schedules
- T2.2.4: Test migration idempotency
T2.3: Implement Schedule Repository
Status: TODO Assignee: TBD Estimate: 1 day
Interface:
public interface IScheduleRepository
{
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, ScheduleQueryOptions? options, CancellationToken ct);
Task UpsertAsync(Schedule schedule, CancellationToken ct);
Task<bool> SoftDeleteAsync(string tenantId, string scheduleId, string deletedBy, DateTimeOffset deletedAt, CancellationToken ct);
Task<IReadOnlyList<Schedule>> GetDueSchedulesAsync(DateTimeOffset now, CancellationToken ct);
}
Subtasks:
- T2.3.1: Implement all interface methods
- T2.3.2: Handle soft delete correctly
- T2.3.3: Implement GetDueSchedules for trigger calculation
- T2.3.4: Write integration tests
T2.4: Implement Run Repository
Status: TODO Assignee: TBD Estimate: 1 day
Interface:
public interface IRunRepository
{
Task<Run?> GetAsync(string tenantId, Guid runId, CancellationToken ct);
Task<IReadOnlyList<Run>> ListAsync(string tenantId, RunQueryOptions? options, CancellationToken ct);
Task<Run> CreateAsync(Run run, CancellationToken ct);
Task<Run> UpdateAsync(Run run, CancellationToken ct);
Task<IReadOnlyList<Run>> GetPendingRunsAsync(string tenantId, CancellationToken ct);
Task<IReadOnlyList<Run>> GetRunsByScheduleAsync(string tenantId, Guid scheduleId, int limit, CancellationToken ct);
}
Subtasks:
- T2.4.1: Implement all interface methods
- T2.4.2: Handle state transitions
- T2.4.3: Implement efficient pagination
- T2.4.4: Write integration tests
T2.5: Implement Graph Job Repository
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.5.1: Implement CRUD operations
- T2.5.2: Implement status queries
- T2.5.3: Write integration tests
T2.6: Implement Policy Job Repository
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.6.1: Implement CRUD operations
- T2.6.2: Implement status queries
- T2.6.3: Write integration tests
T2.7: Implement Impact Snapshot Repository
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.7.1: Implement CRUD operations
- T2.7.2: Implement queries by run
- T2.7.3: Write integration tests
T2.8: Implement Distributed Locking
Status: TODO Assignee: TBD Estimate: 1 day
Description: Implement distributed locking using PostgreSQL advisory locks.
Options:
- PostgreSQL advisory locks (
pg_advisory_lock) - Table-based locks with SELECT FOR UPDATE SKIP LOCKED
- Combination approach
Subtasks:
- T2.8.1: Choose locking strategy
- T2.8.2: Implement
IDistributedLockinterface - T2.8.3: Implement lock acquisition with timeout
- T2.8.4: Implement lock renewal
- T2.8.5: Implement lock release
- T2.8.6: Write concurrency tests
Implementation Example:
public sealed class PostgresDistributedLock : IDistributedLock
{
private readonly SchedulerDataSource _dataSource;
public async Task<IAsyncDisposable?> TryAcquireAsync(
string lockKey,
TimeSpan timeout,
CancellationToken ct)
{
var lockId = ComputeLockId(lockKey);
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var cmd = connection.CreateCommand();
cmd.CommandText = "SELECT pg_try_advisory_lock(@lock_id)";
cmd.Parameters.AddWithValue("lock_id", lockId);
var acquired = await cmd.ExecuteScalarAsync(ct) is true;
if (!acquired) return null;
return new LockHandle(connection, lockId);
}
private static long ComputeLockId(string key)
=> unchecked((long)key.GetHashCode());
}
T2.9: Implement Worker Registration
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.9.1: Implement worker registration
- T2.9.2: Implement heartbeat updates
- T2.9.3: Implement dead worker detection
- T2.9.4: Write integration tests
T2.10: Add Configuration Switch
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.10.1: Update service registration
- T2.10.2: Test backend switching
- T2.10.3: Document configuration
T2.11: Run Verification Tests
Status: TODO Assignee: TBD Estimate: 1 day
Subtasks:
- T2.11.1: Test schedule CRUD
- T2.11.2: Test run creation and state transitions
- T2.11.3: Test trigger calculation
- T2.11.4: Test distributed locking under concurrency
- T2.11.5: Test job execution end-to-end
- T2.11.6: Generate verification report
T2.12: Switch to PostgreSQL-Only
Status: TODO Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.12.1: Update configuration
- T2.12.2: Deploy to staging
- T2.12.3: Run integration tests
- T2.12.4: Deploy to production
- T2.12.5: Monitor metrics
Exit Criteria
- All repository interfaces implemented
- Distributed locking working correctly
- All integration tests pass
- Schedule execution working end-to-end
- Scheduler running on PostgreSQL in production
Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Lock contention | Medium | Medium | Test under load, tune timeouts |
| Trigger calculation errors | Low | High | Extensive testing with edge cases |
| State transition bugs | Medium | Medium | State machine tests |
Phase Version: 1.0.0 Last Updated: 2025-11-28