semi implemented and features implemented save checkpoint

This commit is contained in:
master
2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions

View File

@@ -0,0 +1,37 @@
# Job Lifecycle State Machine
## Module
Orchestrator
## Status
IMPLEMENTED
## Description
Job scheduling with Postgres-backed job repository, event envelope domain model, and air-gap compatible scheduling tests.
## Implementation Details
- **Modules**: `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/`, `src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/`
- **Key Classes**:
- `JobStateMachine` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/JobStateMachine.cs`) - finite state machine governing job lifecycle transitions (Pending -> Scheduled -> Running -> Completed/Failed/Cancelled)
- `JobScheduler` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/JobScheduler.cs`) - schedules jobs based on state machine rules and DAG dependencies
- `RetryPolicy` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/RetryPolicy.cs`) - configurable retry policy for failed jobs (max retries, backoff strategy)
- `Job` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Job.cs`) - job entity with current status, attempts, and metadata
- `JobStatus` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/JobStatus.cs`) - enum defining all valid job states
- `JobHistory` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/JobHistory.cs`) - historical record of all state transitions with timestamps
- `EventEnvelope` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Events/EventEnvelope.cs`) - typed event envelope emitted on state transitions
- `TimelineEvent` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Events/TimelineEvent.cs`) - timeline event for job lifecycle tracking
- `TimelineEventEmitter` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Events/TimelineEventEmitter.cs`) - emits timeline events on state transitions
- `JobEndpoints` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/JobEndpoints.cs`) - REST API for job management
- `JobContracts` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Contracts/JobContracts.cs`) - API contracts for job operations
- **Interfaces**: `IJobRepository` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Infrastructure/Repositories/IJobRepository.cs`), `IJobHistoryRepository` (`src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Infrastructure/Repositories/IJobHistoryRepository.cs`)
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Create a job via `JobEndpoints` and verify initial state is Pending
- [ ] Schedule the job via `JobScheduler` and verify state transition: Pending -> Scheduled, with `TimelineEvent` emitted
- [ ] Start the job and verify `JobStateMachine` transition: Scheduled -> Running
- [ ] Complete the job and verify transition: Running -> Completed with completion timestamp in `JobHistory`
- [ ] Fail the job and verify transition: Running -> Failed with retry attempt incremented
- [ ] Verify `RetryPolicy`: fail a job with max_retries=3 and verify it re-enters Scheduled up to 3 times before terminal failure
- [ ] Attempt an invalid transition (e.g., Completed -> Running) and verify `JobStateMachine` rejects it
- [ ] Verify air-gap scheduling: schedule a job in sealed mode and verify it does not attempt network egress