up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-28 20:55:22 +02:00
parent d040c001ac
commit 2548abc56f
231 changed files with 47468 additions and 68 deletions

View File

@@ -0,0 +1,305 @@
# Phase 2: Scheduler Module Conversion
**Sprint:** 3
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** Phase 0 (Foundations)
---
## Objectives
1. Create `StellaOps.Scheduler.Storage.Postgres` project
2. Implement Scheduler schema in PostgreSQL
3. Implement 7+ repository interfaces
4. Replace MongoDB job tracking with PostgreSQL
5. Implement PostgreSQL advisory locks for distributed locking
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Scheduler schema | All tables created with indexes |
| Repository implementations | All 7+ interfaces implemented |
| Advisory locks | Distributed locking working |
| Integration tests | 100% coverage of CRUD operations |
| Verification report | Schedule execution verified |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.4 for complete Scheduler schema.
**Tables:**
- `scheduler.schedules`
- `scheduler.triggers`
- `scheduler.runs`
- `scheduler.graph_jobs`
- `scheduler.policy_jobs`
- `scheduler.impact_snapshots`
- `scheduler.workers`
- `scheduler.execution_logs`
- `scheduler.locks`
- `scheduler.run_summaries`
- `scheduler.audit`
---
## Task Breakdown
### T2.1: Create Scheduler.Storage.Postgres Project
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.1.1: Create project structure
- [ ] T2.1.2: Add NuGet references
- [ ] T2.1.3: Create `SchedulerDataSource` class
- [ ] T2.1.4: Create `ServiceCollectionExtensions.cs`
---
### T2.2: Implement Schema Migrations
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Subtasks:**
- [ ] T2.2.1: Create `V001_CreateSchedulerSchema` migration
- [ ] T2.2.2: Include all tables and indexes
- [ ] T2.2.3: Add partial index for active schedules
- [ ] T2.2.4: Test migration idempotency
---
### T2.3: Implement Schedule Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Interface:**
```csharp
public interface IScheduleRepository
{
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, ScheduleQueryOptions? options, CancellationToken ct);
Task UpsertAsync(Schedule schedule, CancellationToken ct);
Task<bool> SoftDeleteAsync(string tenantId, string scheduleId, string deletedBy, DateTimeOffset deletedAt, CancellationToken ct);
Task<IReadOnlyList<Schedule>> GetDueSchedulesAsync(DateTimeOffset now, CancellationToken ct);
}
```
**Subtasks:**
- [ ] T2.3.1: Implement all interface methods
- [ ] T2.3.2: Handle soft delete correctly
- [ ] T2.3.3: Implement GetDueSchedules for trigger calculation
- [ ] T2.3.4: Write integration tests
---
### T2.4: Implement Run Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Interface:**
```csharp
public interface IRunRepository
{
Task<Run?> GetAsync(string tenantId, Guid runId, CancellationToken ct);
Task<IReadOnlyList<Run>> ListAsync(string tenantId, RunQueryOptions? options, CancellationToken ct);
Task<Run> CreateAsync(Run run, CancellationToken ct);
Task<Run> UpdateAsync(Run run, CancellationToken ct);
Task<IReadOnlyList<Run>> GetPendingRunsAsync(string tenantId, CancellationToken ct);
Task<IReadOnlyList<Run>> GetRunsByScheduleAsync(string tenantId, Guid scheduleId, int limit, CancellationToken ct);
}
```
**Subtasks:**
- [ ] T2.4.1: Implement all interface methods
- [ ] T2.4.2: Handle state transitions
- [ ] T2.4.3: Implement efficient pagination
- [ ] T2.4.4: Write integration tests
---
### T2.5: Implement Graph Job Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.5.1: Implement CRUD operations
- [ ] T2.5.2: Implement status queries
- [ ] T2.5.3: Write integration tests
---
### T2.6: Implement Policy Job Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.6.1: Implement CRUD operations
- [ ] T2.6.2: Implement status queries
- [ ] T2.6.3: Write integration tests
---
### T2.7: Implement Impact Snapshot Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.7.1: Implement CRUD operations
- [ ] T2.7.2: Implement queries by run
- [ ] T2.7.3: Write integration tests
---
### T2.8: Implement Distributed Locking
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Implement distributed locking using PostgreSQL advisory locks.
**Options:**
1. PostgreSQL advisory locks (`pg_advisory_lock`)
2. Table-based locks with SELECT FOR UPDATE SKIP LOCKED
3. Combination approach
**Subtasks:**
- [ ] T2.8.1: Choose locking strategy
- [ ] T2.8.2: Implement `IDistributedLock` interface
- [ ] T2.8.3: Implement lock acquisition with timeout
- [ ] T2.8.4: Implement lock renewal
- [ ] T2.8.5: Implement lock release
- [ ] T2.8.6: Write concurrency tests
**Implementation Example:**
```csharp
public sealed class PostgresDistributedLock : IDistributedLock
{
private readonly SchedulerDataSource _dataSource;
public async Task<IAsyncDisposable?> TryAcquireAsync(
string lockKey,
TimeSpan timeout,
CancellationToken ct)
{
var lockId = ComputeLockId(lockKey);
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var cmd = connection.CreateCommand();
cmd.CommandText = "SELECT pg_try_advisory_lock(@lock_id)";
cmd.Parameters.AddWithValue("lock_id", lockId);
var acquired = await cmd.ExecuteScalarAsync(ct) is true;
if (!acquired) return null;
return new LockHandle(connection, lockId);
}
private static long ComputeLockId(string key)
=> unchecked((long)key.GetHashCode());
}
```
---
### T2.9: Implement Worker Registration
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.9.1: Implement worker registration
- [ ] T2.9.2: Implement heartbeat updates
- [ ] T2.9.3: Implement dead worker detection
- [ ] T2.9.4: Write integration tests
---
### T2.10: Add Configuration Switch
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.10.1: Update service registration
- [ ] T2.10.2: Test backend switching
- [ ] T2.10.3: Document configuration
---
### T2.11: Run Verification Tests
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Subtasks:**
- [ ] T2.11.1: Test schedule CRUD
- [ ] T2.11.2: Test run creation and state transitions
- [ ] T2.11.3: Test trigger calculation
- [ ] T2.11.4: Test distributed locking under concurrency
- [ ] T2.11.5: Test job execution end-to-end
- [ ] T2.11.6: Generate verification report
---
### T2.12: Switch to PostgreSQL-Only
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.12.1: Update configuration
- [ ] T2.12.2: Deploy to staging
- [ ] T2.12.3: Run integration tests
- [ ] T2.12.4: Deploy to production
- [ ] T2.12.5: Monitor metrics
---
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Distributed locking working correctly
- [ ] All integration tests pass
- [ ] Schedule execution working end-to-end
- [ ] Scheduler running on PostgreSQL in production
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Lock contention | Medium | Medium | Test under load, tune timeouts |
| Trigger calculation errors | Low | High | Extensive testing with edge cases |
| State transition bugs | Medium | Medium | State machine tests |
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*