Files

Docs CI / lint-and-preview (push) Waiting to run

Details

feat: Enhance Task Runner with simulation and failure policy support

- Added tests for output projection and failure policy population in TaskPackPlanner.
- Introduced new failure policy manifest in TestManifests.
- Implemented simulation endpoints in the web service for task execution.
- Created TaskRunnerServiceOptions for configuration management.
- Updated appsettings.json to include TaskRunner configuration.
- Enhanced PackRunWorkerService to handle execution graphs and state management.
- Added support for parallel execution and conditional steps in the worker service.
- Updated documentation to reflect new features and changes in execution flow.

2025-11-04 19:05:56 +02:00

3.3 KiB

Raw Blame History

Task Runner — Simulation & Failure Policy Notes

Status: Draft (2025-11-04) — execution wiring + CLI simulate command landed; docs pending final polish

The Task Runner planning layer now materialises additional runtime metadata to unblock execution and simulation flows:

Execution graph builder – converts TaskPackPlan steps (including map and parallel) into a deterministic graph with preserved enablement flags and per-step metadata (maxParallel, continueOnError, parameters, approval IDs).
Simulation engine – walks the execution graph and classifies steps as pending, skipped, requires-approval, or requires-policy, producing a deterministic preview for CLI/UI consumers while surfacing declared outputs.
Failure policy – pack-level spec.failure.retries is normalised into a TaskPackPlanFailurePolicy (default: maxAttempts = 1, backoffSeconds = 0). The new step state machine uses this policy to schedule retries and to determine when a run must abort.
Simulation API + Worker – POST /v1/task-runner/simulations returns the deterministic preview; GET /v1/task-runner/runs/{id} exposes persisted retry windows now written by the worker as it honours maxParallel, continueOnError, and retry windows during execution.

Current behaviour

Map steps expand into child iterations (stepId[index]::templateId) with per-item parameters preserved for runtime reference.
Parallel blocks honour maxParallel (defaults to unlimited) and the worker executes children accordingly, short-circuiting when continueOnError is false.
Simulation output mirrors approvals/policy gates, allowing the WebService/CLI to show which actions must occur before execution resumes.
File-backed state store persists PackRunState snapshots (nextAttemptAt, attempts, reasons) so orchestration clients and CLI can resume runs deterministically even in air-gapped environments.
Step state machine transitions:
- pending → running → succeeded
- running → failed (abort) once attempts ≥ maxAttempts
- running → pending with scheduled nextAttemptAt when retries remain
- pending → skipped for disabled steps (e.g., when expressions).

CLI usage

Run the simulation without mutating state:

stella task-runner simulate \
  --manifest ./packs/sample-pack.yaml \
  --inputs ./inputs.json \
  --format table

Use --format json (or --output path.json) to emit the raw payload produced by POST /api/task-runner/simulations.

Follow-up gaps

Fold the CLI command into the official reference/quickstart guides and capture exit-code conventions.

References:

src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunExecutionGraphBuilder.cs
src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/Simulation/PackRunSimulationEngine.cs
src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunStepStateMachine.cs
src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Infrastructure/Execution/FilePackRunStateStore.cs
src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Worker/Services/PackRunWorkerService.cs
src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.WebService/Program.cs

3.3 KiB Raw Blame History Unescape Escape

Task Runner — Simulation & Failure Policy Notes

Current behaviour

CLI usage

Follow-up gaps

3.3 KiB

Raw Blame History