Files
git.stella-ops.org/docs/features/checked/jobengine/job-lifecycle-state-machine.md

38 lines
3.3 KiB
Markdown

# Job Lifecycle State Machine
## Module
Orchestrator
## Status
IMPLEMENTED
## Description
Job scheduling with Postgres-backed job repository, event envelope domain model, and air-gap compatible scheduling tests.
## Implementation Details
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`
- **Key Classes**:
- `JobStateMachine` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobStateMachine.cs`) - finite state machine governing job lifecycle transitions (Pending -> Scheduled -> Running -> Completed/Failed/Cancelled)
- `JobScheduler` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobScheduler.cs`) - schedules jobs based on state machine rules and DAG dependencies
- `RetryPolicy` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/RetryPolicy.cs`) - configurable retry policy for failed jobs (max retries, backoff strategy)
- `Job` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Job.cs`) - job entity with current status, attempts, and metadata
- `JobStatus` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobStatus.cs`) - enum defining all valid job states
- `JobHistory` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobHistory.cs`) - historical record of all state transitions with timestamps
- `EventEnvelope` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/EventEnvelope.cs`) - typed event envelope emitted on state transitions
- `TimelineEvent` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/TimelineEvent.cs`) - timeline event for job lifecycle tracking
- `TimelineEventEmitter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/TimelineEventEmitter.cs`) - emits timeline events on state transitions
- `JobEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/JobEndpoints.cs`) - REST API for job management
- `JobContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/JobContracts.cs`) - API contracts for job operations
- **Interfaces**: `IJobRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IJobRepository.cs`), `IJobHistoryRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IJobHistoryRepository.cs`)
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Create a job via `JobEndpoints` and verify initial state is Pending
- [ ] Schedule the job via `JobScheduler` and verify state transition: Pending -> Scheduled, with `TimelineEvent` emitted
- [ ] Start the job and verify `JobStateMachine` transition: Scheduled -> Running
- [ ] Complete the job and verify transition: Running -> Completed with completion timestamp in `JobHistory`
- [ ] Fail the job and verify transition: Running -> Failed with retry attempt incremented
- [ ] Verify `RetryPolicy`: fail a job with max_retries=3 and verify it re-enters Scheduled up to 3 times before terminal failure
- [ ] Attempt an invalid transition (e.g., Completed -> Running) and verify `JobStateMachine` rejects it
- [ ] Verify air-gap scheduling: schedule a job in sealed mode and verify it does not attempt network egress