- Implemented tests for RouterConfig, RoutingOptions, StaticInstanceConfig, and RouterConfigOptions to ensure default values are set correctly. - Added tests for RouterConfigProvider to validate configurations and ensure defaults are returned when no file is specified. - Created tests for ConfigValidationResult to check success and error scenarios. - Developed tests for ServiceCollectionExtensions to verify service registration for RouterConfig. - Introduced UdpTransportTests to validate serialization, connection, request-response, and error handling in UDP transport. - Added scripts for signing authority gaps and hashing DevPortal SDK snippets.
8.5 KiB
Phase 2: Scheduler Module Conversion
Sprint: 3 Duration: 1 sprint Status: DOING (fresh-start approved; Mongo backfill skipped) Dependencies: Phase 0 (Foundations) — DONE
Objectives
- Create
StellaOps.Scheduler.Storage.Postgresproject - Implement Scheduler schema in PostgreSQL
- Implement 7+ repository interfaces
- Replace MongoDB job tracking with PostgreSQL
- Implement PostgreSQL advisory locks for distributed locking
- Backfill Mongo data or explicitly decide on fresh-start (PG-T2.9–T2.11)
Deliverables
| Deliverable | Acceptance Criteria |
|---|---|
| Scheduler schema | All tables created with indexes |
| Repository implementations | All 7+ interfaces implemented |
| Advisory locks | Distributed locking working |
| Integration tests | 100% coverage of CRUD operations |
| Verification report | Schedule execution verified |
Schema Reference
See SPECIFICATION.md Section 5.4 for complete Scheduler schema.
Tables:
scheduler.schedulesscheduler.triggersscheduler.runsscheduler.graph_jobsscheduler.policy_jobsscheduler.impact_snapshotsscheduler.workersscheduler.execution_logsscheduler.locksscheduler.run_summariesscheduler.audit
Task Breakdown
T2.1: Create Scheduler.Storage.Postgres Project
Status: DONE Assignee: Scheduler Guild Estimate: 0.5 days
Subtasks:
- T2.1.1: Create project structure
- T2.1.2: Add NuGet references
- T2.1.3: Create
SchedulerDataSourceclass - T2.1.4: Create
ServiceCollectionExtensions.cs
T2.2: Implement Schema Migrations
Status: DONE Assignee: Scheduler Guild Estimate: 1 day
Subtasks:
- T2.2.1: Create
V001_CreateSchedulerSchemamigration - T2.2.2: Include all tables and indexes
- T2.2.3: Add partial index for active schedules
- T2.2.4: Test migration idempotency
T2.3: Implement Schedule Repository
Status: DONE Assignee: Scheduler Guild Estimate: 1 day
Interface:
public interface IScheduleRepository
{
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, ScheduleQueryOptions? options, CancellationToken ct);
Task UpsertAsync(Schedule schedule, CancellationToken ct);
Task<bool> SoftDeleteAsync(string tenantId, string scheduleId, string deletedBy, DateTimeOffset deletedAt, CancellationToken ct);
Task<IReadOnlyList<Schedule>> GetDueSchedulesAsync(DateTimeOffset now, CancellationToken ct);
}
Subtasks:
- T2.3.1: Implement all interface methods
- T2.3.2: Handle soft delete correctly
- T2.3.3: Implement GetDueSchedules for trigger calculation
- T2.3.4: Write integration tests
T2.4: Implement Run Repository
Status: DONE Assignee: Scheduler Guild Estimate: 1 day
Interface:
public interface IRunRepository
{
Task<Run?> GetAsync(string tenantId, Guid runId, CancellationToken ct);
Task<IReadOnlyList<Run>> ListAsync(string tenantId, RunQueryOptions? options, CancellationToken ct);
Task<Run> CreateAsync(Run run, CancellationToken ct);
Task<Run> UpdateAsync(Run run, CancellationToken ct);
Task<IReadOnlyList<Run>> GetPendingRunsAsync(string tenantId, CancellationToken ct);
Task<IReadOnlyList<Run>> GetRunsByScheduleAsync(string tenantId, Guid scheduleId, int limit, CancellationToken ct);
}
Subtasks:
- T2.4.1: Implement all interface methods
- T2.4.2: Handle state transitions
- T2.4.3: Implement efficient pagination
- T2.4.4: Write integration tests
T2.5: Implement Graph Job Repository
Status: DONE Assignee: Scheduler Guild Estimate: 0.5 days
Subtasks:
- T2.5.1: Implement CRUD operations
- T2.5.2: Implement status queries
- T2.5.3: Write integration tests
T2.6: Implement Policy Job Repository
Status: DONE Assignee: Scheduler Guild Estimate: 0.5 days
Subtasks:
- T2.6.1: Implement CRUD operations
- T2.6.2: Implement status queries
- T2.6.3: Write integration tests
T2.7: Implement Impact Snapshot Repository
Status: DONE Assignee: Scheduler Guild Estimate: 0.5 days
Subtasks:
- T2.7.1: Implement CRUD operations
- T2.7.2: Implement queries by run
- T2.7.3: Write integration tests
T2.8: Implement Distributed Locking
Status: DONE Assignee: Scheduler Guild Estimate: 1 day
Description: Implement distributed locking using PostgreSQL advisory locks.
Options:
- PostgreSQL advisory locks (
pg_advisory_lock) - Table-based locks with SELECT FOR UPDATE SKIP LOCKED
- Combination approach
Subtasks:
- T2.8.1: Choose locking strategy
- T2.8.2: Implement
IDistributedLockinterface - T2.8.3: Implement lock acquisition with timeout
- T2.8.4: Implement lock renewal
- T2.8.5: Implement lock release
- T2.8.6: Write concurrency tests
Implementation Example:
public sealed class PostgresDistributedLock : IDistributedLock
{
private readonly SchedulerDataSource _dataSource;
public async Task<IAsyncDisposable?> TryAcquireAsync(
string lockKey,
TimeSpan timeout,
CancellationToken ct)
{
var lockId = ComputeLockId(lockKey);
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var cmd = connection.CreateCommand();
cmd.CommandText = "SELECT pg_try_advisory_lock(@lock_id)";
cmd.Parameters.AddWithValue("lock_id", lockId);
var acquired = await cmd.ExecuteScalarAsync(ct) is true;
if (!acquired) return null;
return new LockHandle(connection, lockId);
}
private static long ComputeLockId(string key)
=> unchecked((long)key.GetHashCode());
}
T2.9: Implement Worker Registration
Status: DONE Assignee: TBD Estimate: 0.5 days
Subtasks:
- T2.9.1: Implement worker registration
- T2.9.2: Implement heartbeat updates
- T2.9.3: Implement dead worker detection
- T2.9.4: Write integration tests
T2.10: Add Configuration Switch
Status: DONE Assignee: Scheduler Guild Estimate: 0.5 days
Subtasks:
- T2.10.1: Update service registration
- T2.10.2: Test backend switching
- T2.10.3: Document configuration
T2.11: Run Verification Tests
Status: DONE (fresh-start; Postgres-only verification) Assignee: Scheduler Guild Estimate: 1 day
Subtasks:
- T2.11.1: Test schedule CRUD
- T2.11.2: Test run creation and state transitions
- T2.11.3: Test trigger calculation
- T2.11.4: Test distributed locking under concurrency
- T2.11.5: Test job execution end-to-end
- T2.11.6: Generate verification report (fresh-start baseline; Mongo parity not applicable)
T2.12: Switch to PostgreSQL-Only
Status: DONE Assignee: Scheduler Guild Estimate: 0.5 days
Subtasks:
- T2.12.1: Update configuration (
Persistence:Scheduler=Postgres) - T2.12.2: Deploy to staging
- T2.12.3: Run integration tests
- T2.12.4: Deploy to production
- T2.12.5: Monitor metrics
Exit Criteria
- All repository interfaces implemented
- Distributed locking working correctly
- All integration tests pass (module-level)
- Fresh-start verification completed (no Mongo parity/backfill)
- Scheduler running on PostgreSQL in staging/production
Execution Log
| Date (UTC) | Update | Owner |
|---|---|---|
| 2025-11-28 | Project + schema migration created; repos implemented (T2.1–T2.8) | Scheduler Guild |
| 2025-11-30 | Determinism and concurrency tests added; advisory locks in place | Scheduler Guild |
| 2025-12-02 | Backfill tool added; Mongo endpoint unavailable → parity/backfill blocked | Scheduler Guild |
| 2025-12-05 | Phase 0 unblocked; fresh-start approved (skip Mongo backfill). Verification done on Postgres-only baseline; cutover pending config switch/deploy. | PM |
| 2025-12-05 | Config switched to Postgres, staged and produced deployed; integration smoke passed; monitoring active. | Scheduler Guild |
Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Lock contention | Medium | Medium | Test under load, tune timeouts |
| Trigger calculation errors | Low | High | Extensive testing with edge cases |
| State transition bugs | Medium | Medium | State machine tests |
Phase Version: 1.0.0 Last Updated: 2025-11-28