Files
git.stella-ops.org/docs/features/unchecked/orchestrator/job-lifecycle-state-machine.md

3.4 KiB

Job Lifecycle State Machine

Module

Orchestrator

Status

IMPLEMENTED

Description

Job scheduling with Postgres-backed job repository, event envelope domain model, and air-gap compatible scheduling tests.

Implementation Details

  • Modules: src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/, src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/
  • Key Classes:
    • JobStateMachine (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/JobStateMachine.cs) - finite state machine governing job lifecycle transitions (Pending -> Scheduled -> Running -> Completed/Failed/Cancelled)
    • JobScheduler (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/JobScheduler.cs) - schedules jobs based on state machine rules and DAG dependencies
    • RetryPolicy (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Scheduling/RetryPolicy.cs) - configurable retry policy for failed jobs (max retries, backoff strategy)
    • Job (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Job.cs) - job entity with current status, attempts, and metadata
    • JobStatus (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/JobStatus.cs) - enum defining all valid job states
    • JobHistory (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/JobHistory.cs) - historical record of all state transitions with timestamps
    • EventEnvelope (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Events/EventEnvelope.cs) - typed event envelope emitted on state transitions
    • TimelineEvent (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Events/TimelineEvent.cs) - timeline event for job lifecycle tracking
    • TimelineEventEmitter (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Core/Domain/Events/TimelineEventEmitter.cs) - emits timeline events on state transitions
    • JobEndpoints (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Endpoints/JobEndpoints.cs) - REST API for job management
    • JobContracts (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.WebService/Contracts/JobContracts.cs) - API contracts for job operations
  • Interfaces: IJobRepository (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Infrastructure/Repositories/IJobRepository.cs), IJobHistoryRepository (src/Orchestrator/StellaOps.Orchestrator/StellaOps.Orchestrator.Infrastructure/Repositories/IJobHistoryRepository.cs)
  • Source: Feature matrix scan

E2E Test Plan

  • Create a job via JobEndpoints and verify initial state is Pending
  • Schedule the job via JobScheduler and verify state transition: Pending -> Scheduled, with TimelineEvent emitted
  • Start the job and verify JobStateMachine transition: Scheduled -> Running
  • Complete the job and verify transition: Running -> Completed with completion timestamp in JobHistory
  • Fail the job and verify transition: Running -> Failed with retry attempt incremented
  • Verify RetryPolicy: fail a job with max_retries=3 and verify it re-enters Scheduled up to 3 times before terminal failure
  • Attempt an invalid transition (e.g., Completed -> Running) and verify JobStateMachine rejects it
  • Verify air-gap scheduling: schedule a job in sealed mode and verify it does not attempt network egress