Files
git.stella-ops.org/docs/db/tasks/PHASE_2_SCHEDULER.md
StellaOps Bot 2548abc56f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
up
2025-11-29 01:35:49 +02:00

7.4 KiB

Phase 2: Scheduler Module Conversion

Sprint: 3 Duration: 1 sprint Status: TODO Dependencies: Phase 0 (Foundations)


Objectives

  1. Create StellaOps.Scheduler.Storage.Postgres project
  2. Implement Scheduler schema in PostgreSQL
  3. Implement 7+ repository interfaces
  4. Replace MongoDB job tracking with PostgreSQL
  5. Implement PostgreSQL advisory locks for distributed locking

Deliverables

Deliverable Acceptance Criteria
Scheduler schema All tables created with indexes
Repository implementations All 7+ interfaces implemented
Advisory locks Distributed locking working
Integration tests 100% coverage of CRUD operations
Verification report Schedule execution verified

Schema Reference

See SPECIFICATION.md Section 5.4 for complete Scheduler schema.

Tables:

  • scheduler.schedules
  • scheduler.triggers
  • scheduler.runs
  • scheduler.graph_jobs
  • scheduler.policy_jobs
  • scheduler.impact_snapshots
  • scheduler.workers
  • scheduler.execution_logs
  • scheduler.locks
  • scheduler.run_summaries
  • scheduler.audit

Task Breakdown

T2.1: Create Scheduler.Storage.Postgres Project

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.1.1: Create project structure
  • T2.1.2: Add NuGet references
  • T2.1.3: Create SchedulerDataSource class
  • T2.1.4: Create ServiceCollectionExtensions.cs

T2.2: Implement Schema Migrations

Status: TODO Assignee: TBD Estimate: 1 day

Subtasks:

  • T2.2.1: Create V001_CreateSchedulerSchema migration
  • T2.2.2: Include all tables and indexes
  • T2.2.3: Add partial index for active schedules
  • T2.2.4: Test migration idempotency

T2.3: Implement Schedule Repository

Status: TODO Assignee: TBD Estimate: 1 day

Interface:

public interface IScheduleRepository
{
    Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
    Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, ScheduleQueryOptions? options, CancellationToken ct);
    Task UpsertAsync(Schedule schedule, CancellationToken ct);
    Task<bool> SoftDeleteAsync(string tenantId, string scheduleId, string deletedBy, DateTimeOffset deletedAt, CancellationToken ct);
    Task<IReadOnlyList<Schedule>> GetDueSchedulesAsync(DateTimeOffset now, CancellationToken ct);
}

Subtasks:

  • T2.3.1: Implement all interface methods
  • T2.3.2: Handle soft delete correctly
  • T2.3.3: Implement GetDueSchedules for trigger calculation
  • T2.3.4: Write integration tests

T2.4: Implement Run Repository

Status: TODO Assignee: TBD Estimate: 1 day

Interface:

public interface IRunRepository
{
    Task<Run?> GetAsync(string tenantId, Guid runId, CancellationToken ct);
    Task<IReadOnlyList<Run>> ListAsync(string tenantId, RunQueryOptions? options, CancellationToken ct);
    Task<Run> CreateAsync(Run run, CancellationToken ct);
    Task<Run> UpdateAsync(Run run, CancellationToken ct);
    Task<IReadOnlyList<Run>> GetPendingRunsAsync(string tenantId, CancellationToken ct);
    Task<IReadOnlyList<Run>> GetRunsByScheduleAsync(string tenantId, Guid scheduleId, int limit, CancellationToken ct);
}

Subtasks:

  • T2.4.1: Implement all interface methods
  • T2.4.2: Handle state transitions
  • T2.4.3: Implement efficient pagination
  • T2.4.4: Write integration tests

T2.5: Implement Graph Job Repository

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.5.1: Implement CRUD operations
  • T2.5.2: Implement status queries
  • T2.5.3: Write integration tests

T2.6: Implement Policy Job Repository

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.6.1: Implement CRUD operations
  • T2.6.2: Implement status queries
  • T2.6.3: Write integration tests

T2.7: Implement Impact Snapshot Repository

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.7.1: Implement CRUD operations
  • T2.7.2: Implement queries by run
  • T2.7.3: Write integration tests

T2.8: Implement Distributed Locking

Status: TODO Assignee: TBD Estimate: 1 day

Description: Implement distributed locking using PostgreSQL advisory locks.

Options:

  1. PostgreSQL advisory locks (pg_advisory_lock)
  2. Table-based locks with SELECT FOR UPDATE SKIP LOCKED
  3. Combination approach

Subtasks:

  • T2.8.1: Choose locking strategy
  • T2.8.2: Implement IDistributedLock interface
  • T2.8.3: Implement lock acquisition with timeout
  • T2.8.4: Implement lock renewal
  • T2.8.5: Implement lock release
  • T2.8.6: Write concurrency tests

Implementation Example:

public sealed class PostgresDistributedLock : IDistributedLock
{
    private readonly SchedulerDataSource _dataSource;

    public async Task<IAsyncDisposable?> TryAcquireAsync(
        string lockKey,
        TimeSpan timeout,
        CancellationToken ct)
    {
        var lockId = ComputeLockId(lockKey);
        await using var connection = await _dataSource.OpenConnectionAsync("system", ct);

        await using var cmd = connection.CreateCommand();
        cmd.CommandText = "SELECT pg_try_advisory_lock(@lock_id)";
        cmd.Parameters.AddWithValue("lock_id", lockId);

        var acquired = await cmd.ExecuteScalarAsync(ct) is true;
        if (!acquired) return null;

        return new LockHandle(connection, lockId);
    }

    private static long ComputeLockId(string key)
        => unchecked((long)key.GetHashCode());
}

T2.9: Implement Worker Registration

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.9.1: Implement worker registration
  • T2.9.2: Implement heartbeat updates
  • T2.9.3: Implement dead worker detection
  • T2.9.4: Write integration tests

T2.10: Add Configuration Switch

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.10.1: Update service registration
  • T2.10.2: Test backend switching
  • T2.10.3: Document configuration

T2.11: Run Verification Tests

Status: TODO Assignee: TBD Estimate: 1 day

Subtasks:

  • T2.11.1: Test schedule CRUD
  • T2.11.2: Test run creation and state transitions
  • T2.11.3: Test trigger calculation
  • T2.11.4: Test distributed locking under concurrency
  • T2.11.5: Test job execution end-to-end
  • T2.11.6: Generate verification report

T2.12: Switch to PostgreSQL-Only

Status: TODO Assignee: TBD Estimate: 0.5 days

Subtasks:

  • T2.12.1: Update configuration
  • T2.12.2: Deploy to staging
  • T2.12.3: Run integration tests
  • T2.12.4: Deploy to production
  • T2.12.5: Monitor metrics

Exit Criteria

  • All repository interfaces implemented
  • Distributed locking working correctly
  • All integration tests pass
  • Schedule execution working end-to-end
  • Scheduler running on PostgreSQL in production

Risks & Mitigations

Risk Likelihood Impact Mitigation
Lock contention Medium Medium Test under load, tune timeouts
Trigger calculation errors Low High Extensive testing with edge cases
State transition bugs Medium Medium State machine tests

Phase Version: 1.0.0 Last Updated: 2025-11-28