up
This commit is contained in:
305
docs/db/tasks/PHASE_2_SCHEDULER.md
Normal file
305
docs/db/tasks/PHASE_2_SCHEDULER.md
Normal file
@@ -0,0 +1,305 @@
|
||||
# Phase 2: Scheduler Module Conversion
|
||||
|
||||
**Sprint:** 3
|
||||
**Duration:** 1 sprint
|
||||
**Status:** TODO
|
||||
**Dependencies:** Phase 0 (Foundations)
|
||||
|
||||
---
|
||||
|
||||
## Objectives
|
||||
|
||||
1. Create `StellaOps.Scheduler.Storage.Postgres` project
|
||||
2. Implement Scheduler schema in PostgreSQL
|
||||
3. Implement 7+ repository interfaces
|
||||
4. Replace MongoDB job tracking with PostgreSQL
|
||||
5. Implement PostgreSQL advisory locks for distributed locking
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
| Deliverable | Acceptance Criteria |
|
||||
|-------------|---------------------|
|
||||
| Scheduler schema | All tables created with indexes |
|
||||
| Repository implementations | All 7+ interfaces implemented |
|
||||
| Advisory locks | Distributed locking working |
|
||||
| Integration tests | 100% coverage of CRUD operations |
|
||||
| Verification report | Schedule execution verified |
|
||||
|
||||
---
|
||||
|
||||
## Schema Reference
|
||||
|
||||
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.4 for complete Scheduler schema.
|
||||
|
||||
**Tables:**
|
||||
- `scheduler.schedules`
|
||||
- `scheduler.triggers`
|
||||
- `scheduler.runs`
|
||||
- `scheduler.graph_jobs`
|
||||
- `scheduler.policy_jobs`
|
||||
- `scheduler.impact_snapshots`
|
||||
- `scheduler.workers`
|
||||
- `scheduler.execution_logs`
|
||||
- `scheduler.locks`
|
||||
- `scheduler.run_summaries`
|
||||
- `scheduler.audit`
|
||||
|
||||
---
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### T2.1: Create Scheduler.Storage.Postgres Project
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.1.1: Create project structure
|
||||
- [ ] T2.1.2: Add NuGet references
|
||||
- [ ] T2.1.3: Create `SchedulerDataSource` class
|
||||
- [ ] T2.1.4: Create `ServiceCollectionExtensions.cs`
|
||||
|
||||
---
|
||||
|
||||
### T2.2: Implement Schema Migrations
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.2.1: Create `V001_CreateSchedulerSchema` migration
|
||||
- [ ] T2.2.2: Include all tables and indexes
|
||||
- [ ] T2.2.3: Add partial index for active schedules
|
||||
- [ ] T2.2.4: Test migration idempotency
|
||||
|
||||
---
|
||||
|
||||
### T2.3: Implement Schedule Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Interface:**
|
||||
```csharp
|
||||
public interface IScheduleRepository
|
||||
{
|
||||
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
|
||||
Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, ScheduleQueryOptions? options, CancellationToken ct);
|
||||
Task UpsertAsync(Schedule schedule, CancellationToken ct);
|
||||
Task<bool> SoftDeleteAsync(string tenantId, string scheduleId, string deletedBy, DateTimeOffset deletedAt, CancellationToken ct);
|
||||
Task<IReadOnlyList<Schedule>> GetDueSchedulesAsync(DateTimeOffset now, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.3.1: Implement all interface methods
|
||||
- [ ] T2.3.2: Handle soft delete correctly
|
||||
- [ ] T2.3.3: Implement GetDueSchedules for trigger calculation
|
||||
- [ ] T2.3.4: Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T2.4: Implement Run Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Interface:**
|
||||
```csharp
|
||||
public interface IRunRepository
|
||||
{
|
||||
Task<Run?> GetAsync(string tenantId, Guid runId, CancellationToken ct);
|
||||
Task<IReadOnlyList<Run>> ListAsync(string tenantId, RunQueryOptions? options, CancellationToken ct);
|
||||
Task<Run> CreateAsync(Run run, CancellationToken ct);
|
||||
Task<Run> UpdateAsync(Run run, CancellationToken ct);
|
||||
Task<IReadOnlyList<Run>> GetPendingRunsAsync(string tenantId, CancellationToken ct);
|
||||
Task<IReadOnlyList<Run>> GetRunsByScheduleAsync(string tenantId, Guid scheduleId, int limit, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.4.1: Implement all interface methods
|
||||
- [ ] T2.4.2: Handle state transitions
|
||||
- [ ] T2.4.3: Implement efficient pagination
|
||||
- [ ] T2.4.4: Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T2.5: Implement Graph Job Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.5.1: Implement CRUD operations
|
||||
- [ ] T2.5.2: Implement status queries
|
||||
- [ ] T2.5.3: Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T2.6: Implement Policy Job Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.6.1: Implement CRUD operations
|
||||
- [ ] T2.6.2: Implement status queries
|
||||
- [ ] T2.6.3: Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T2.7: Implement Impact Snapshot Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.7.1: Implement CRUD operations
|
||||
- [ ] T2.7.2: Implement queries by run
|
||||
- [ ] T2.7.3: Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T2.8: Implement Distributed Locking
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Description:**
|
||||
Implement distributed locking using PostgreSQL advisory locks.
|
||||
|
||||
**Options:**
|
||||
1. PostgreSQL advisory locks (`pg_advisory_lock`)
|
||||
2. Table-based locks with SELECT FOR UPDATE SKIP LOCKED
|
||||
3. Combination approach
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.8.1: Choose locking strategy
|
||||
- [ ] T2.8.2: Implement `IDistributedLock` interface
|
||||
- [ ] T2.8.3: Implement lock acquisition with timeout
|
||||
- [ ] T2.8.4: Implement lock renewal
|
||||
- [ ] T2.8.5: Implement lock release
|
||||
- [ ] T2.8.6: Write concurrency tests
|
||||
|
||||
**Implementation Example:**
|
||||
```csharp
|
||||
public sealed class PostgresDistributedLock : IDistributedLock
|
||||
{
|
||||
private readonly SchedulerDataSource _dataSource;
|
||||
|
||||
public async Task<IAsyncDisposable?> TryAcquireAsync(
|
||||
string lockKey,
|
||||
TimeSpan timeout,
|
||||
CancellationToken ct)
|
||||
{
|
||||
var lockId = ComputeLockId(lockKey);
|
||||
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
|
||||
|
||||
await using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "SELECT pg_try_advisory_lock(@lock_id)";
|
||||
cmd.Parameters.AddWithValue("lock_id", lockId);
|
||||
|
||||
var acquired = await cmd.ExecuteScalarAsync(ct) is true;
|
||||
if (!acquired) return null;
|
||||
|
||||
return new LockHandle(connection, lockId);
|
||||
}
|
||||
|
||||
private static long ComputeLockId(string key)
|
||||
=> unchecked((long)key.GetHashCode());
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### T2.9: Implement Worker Registration
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.9.1: Implement worker registration
|
||||
- [ ] T2.9.2: Implement heartbeat updates
|
||||
- [ ] T2.9.3: Implement dead worker detection
|
||||
- [ ] T2.9.4: Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T2.10: Add Configuration Switch
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.10.1: Update service registration
|
||||
- [ ] T2.10.2: Test backend switching
|
||||
- [ ] T2.10.3: Document configuration
|
||||
|
||||
---
|
||||
|
||||
### T2.11: Run Verification Tests
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.11.1: Test schedule CRUD
|
||||
- [ ] T2.11.2: Test run creation and state transitions
|
||||
- [ ] T2.11.3: Test trigger calculation
|
||||
- [ ] T2.11.4: Test distributed locking under concurrency
|
||||
- [ ] T2.11.5: Test job execution end-to-end
|
||||
- [ ] T2.11.6: Generate verification report
|
||||
|
||||
---
|
||||
|
||||
### T2.12: Switch to PostgreSQL-Only
|
||||
|
||||
**Status:** TODO
|
||||
**Assignee:** TBD
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] T2.12.1: Update configuration
|
||||
- [ ] T2.12.2: Deploy to staging
|
||||
- [ ] T2.12.3: Run integration tests
|
||||
- [ ] T2.12.4: Deploy to production
|
||||
- [ ] T2.12.5: Monitor metrics
|
||||
|
||||
---
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- [ ] All repository interfaces implemented
|
||||
- [ ] Distributed locking working correctly
|
||||
- [ ] All integration tests pass
|
||||
- [ ] Schedule execution working end-to-end
|
||||
- [ ] Scheduler running on PostgreSQL in production
|
||||
|
||||
---
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Lock contention | Medium | Medium | Test under load, tune timeouts |
|
||||
| Trigger calculation errors | Low | High | Extensive testing with edge cases |
|
||||
| State transition bugs | Medium | Medium | State machine tests |
|
||||
|
||||
---
|
||||
|
||||
*Phase Version: 1.0.0*
|
||||
*Last Updated: 2025-11-28*
|
||||
Reference in New Issue
Block a user