Files
git.stella-ops.org/docs/modules/devops/task-runner-simulation.md
master 3bd0955202
Some checks are pending
Docs CI / lint-and-preview (push) Waiting to run
feat: Enhance Task Runner with simulation and failure policy support
- Added tests for output projection and failure policy population in TaskPackPlanner.
- Introduced new failure policy manifest in TestManifests.
- Implemented simulation endpoints in the web service for task execution.
- Created TaskRunnerServiceOptions for configuration management.
- Updated appsettings.json to include TaskRunner configuration.
- Enhanced PackRunWorkerService to handle execution graphs and state management.
- Added support for parallel execution and conditional steps in the worker service.
- Updated documentation to reflect new features and changes in execution flow.
2025-11-04 19:05:56 +02:00

3.3 KiB
Raw Blame History

Task Runner — Simulation & Failure Policy Notes

Status: Draft (2025-11-04) — execution wiring + CLI simulate command landed; docs pending final polish

The Task Runner planning layer now materialises additional runtime metadata to unblock execution and simulation flows:

  • Execution graph builder converts TaskPackPlan steps (including map and parallel) into a deterministic graph with preserved enablement flags and per-step metadata (maxParallel, continueOnError, parameters, approval IDs).
  • Simulation engine walks the execution graph and classifies steps as pending, skipped, requires-approval, or requires-policy, producing a deterministic preview for CLI/UI consumers while surfacing declared outputs.
  • Failure policy pack-level spec.failure.retries is normalised into a TaskPackPlanFailurePolicy (default: maxAttempts = 1, backoffSeconds = 0). The new step state machine uses this policy to schedule retries and to determine when a run must abort.
  • Simulation API + Worker POST /v1/task-runner/simulations returns the deterministic preview; GET /v1/task-runner/runs/{id} exposes persisted retry windows now written by the worker as it honours maxParallel, continueOnError, and retry windows during execution.

Current behaviour

  • Map steps expand into child iterations (stepId[index]::templateId) with per-item parameters preserved for runtime reference.
  • Parallel blocks honour maxParallel (defaults to unlimited) and the worker executes children accordingly, short-circuiting when continueOnError is false.
  • Simulation output mirrors approvals/policy gates, allowing the WebService/CLI to show which actions must occur before execution resumes.
  • File-backed state store persists PackRunState snapshots (nextAttemptAt, attempts, reasons) so orchestration clients and CLI can resume runs deterministically even in air-gapped environments.
  • Step state machine transitions:
    • pending → running → succeeded
    • running → failed (abort) once attempts ≥ maxAttempts
    • running → pending with scheduled nextAttemptAt when retries remain
    • pending → skipped for disabled steps (e.g., when expressions).

CLI usage

Run the simulation without mutating state:

stella task-runner simulate \
  --manifest ./packs/sample-pack.yaml \
  --inputs ./inputs.json \
  --format table

Use --format json (or --output path.json) to emit the raw payload produced by POST /api/task-runner/simulations.

Follow-up gaps

  • Fold the CLI command into the official reference/quickstart guides and capture exit-code conventions.

References:

  • src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunExecutionGraphBuilder.cs
  • src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/Simulation/PackRunSimulationEngine.cs
  • src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunStepStateMachine.cs
  • src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Infrastructure/Execution/FilePackRunStateStore.cs
  • src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Worker/Services/PackRunWorkerService.cs
  • src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.WebService/Program.cs