feat: Enhance Task Runner with simulation and failure policy support

- Added tests for output projection and failure policy population in TaskPackPlanner. - Introduced new failure policy manifest in TestManifests. - Implemented simulation endpoints in the web service for task execution. - Created TaskRunnerServiceOptions for configuration management. - Updated appsettings.json to include TaskRunner configuration. - Enhanced PackRunWorkerService to handle execution graphs and state management. - Added support for parallel execution and conditional steps in the worker service. - Updated documentation to reflect new features and changes in execution flow.
2025-11-04 19:05:50 +02:00
parent 2eb6852d34
commit 3bd0955202
83 changed files with 15161 additions and 10678 deletions
--- a/docs/modules/devops/AGENTS.md
+++ b/docs/modules/devops/AGENTS.md
@@ -8,6 +8,7 @@ The DevOps module captures release, deployment, and migration playbooks that kee
 - [Architecture](./architecture.md)
 - [Implementation plan](./implementation_plan.md)
 - [Task board](./TASKS.md)
+- [Task Runner simulation notes](./task-runner-simulation.md)

 ## How to get started
 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.
--- a/docs/modules/devops/README.md
+++ b/docs/modules/devops/README.md
@@ -1,41 +1,42 @@
-# StellaOps DevOps
-
-The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
-
-## Responsibilities
- Maintain CI pipelines, signing workflows, and release packaging steps.
- Operate shared runbooks for launch readiness, upgrades, and NuGet previews.
- Provide offline kit assembly instructions and tooling integration.
- Wrap observability/telemetry bootstrap flows for platform teams.
-
-## Key components
- Runbooks under ./runbooks/ (launch, deployment, nuget).
- Migration guidance under ./migrations/.
- Architecture overview bridging CI/CD & infrastructure concerns.
-
-## Integrations & dependencies
- Ops pipelines (Gitea, GitHub Actions) and artifact registries.
- Authority/Signer for supply chain signing.
- Telemetry stack bootstrap scripts.
-
-## Operational notes
- Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md.
- Dashboards for launch cutover rehearsals.
- Coordination with Security for enforced guardrails.
-
-## Related resources
- ./runbooks/launch-readiness.md
- ./runbooks/launch-cutover.md
- ./runbooks/deployment-upgrade.md
- ./runbooks/nuget-preview-bootstrap.md
- ./migrations/semver-style.md
-
-## Backlog references
- DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md.
- Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`.
-
-## Epic alignment
- **Epic 1 – AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines.
- **Epic 9 – Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance.
- **Epic 10 – Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports.
- **Epic 15 – Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation.
+# StellaOps DevOps
+
+The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
+
+## Responsibilities
+- Maintain CI pipelines, signing workflows, and release packaging steps.
+- Operate shared runbooks for launch readiness, upgrades, and NuGet previews.
+- Provide offline kit assembly instructions and tooling integration.
+- Wrap observability/telemetry bootstrap flows for platform teams.
+
+## Key components
+- Runbooks under ./runbooks/ (launch, deployment, nuget).
+- Migration guidance under ./migrations/.
+- Architecture overview bridging CI/CD & infrastructure concerns.
+
+## Integrations & dependencies
+- Ops pipelines (Gitea, GitHub Actions) and artifact registries.
+- Authority/Signer for supply chain signing.
+- Telemetry stack bootstrap scripts.
+
+## Operational notes
+- Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md.
+- Dashboards for launch cutover rehearsals.
+- Coordination with Security for enforced guardrails.
+
+## Related resources
+- ./runbooks/launch-readiness.md
+- ./runbooks/launch-cutover.md
+- ./runbooks/deployment-upgrade.md
+- ./runbooks/nuget-preview-bootstrap.md
+- ./migrations/semver-style.md
+- ./task-runner-simulation.md
+
+## Backlog references
+- DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md.
+- Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`.
+
+## Epic alignment
+- **Epic 1 – AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines.
+- **Epic 9 – Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance.
+- **Epic 10 – Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports.
+- **Epic 15 – Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation.
--- a/docs/modules/devops/task-runner-simulation.md
+++ b/docs/modules/devops/task-runner-simulation.md
@@ -0,0 +1,48 @@
+# Task Runner — Simulation & Failure Policy Notes
+
+> **Status:** Draft (2025-11-04) — execution wiring + CLI simulate command landed; docs pending final polish
+
+The Task Runner planning layer now materialises additional runtime metadata to unblock execution and simulation flows:
+
+- **Execution graph builder** – converts `TaskPackPlan` steps (including `map` and `parallel`) into a deterministic graph with preserved enablement flags and per-step metadata (`maxParallel`, `continueOnError`, parameters, approval IDs).
+- **Simulation engine** – walks the execution graph and classifies steps as `pending`, `skipped`, `requires-approval`, or `requires-policy`, producing a deterministic preview for CLI/UI consumers while surfacing declared outputs.
+- **Failure policy** – pack-level `spec.failure.retries` is normalised into a `TaskPackPlanFailurePolicy` (default: `maxAttempts = 1`, `backoffSeconds = 0`). The new step state machine uses this policy to schedule retries and to determine when a run must abort.
+- **Simulation API + Worker** – `POST /v1/task-runner/simulations` returns the deterministic preview; `GET /v1/task-runner/runs/{id}` exposes persisted retry windows now written by the worker as it honours `maxParallel`, `continueOnError`, and retry windows during execution.
+
+## Current behaviour
+
+- Map steps expand into child iterations (`stepId[index]::templateId`) with per-item parameters preserved for runtime reference.
+- Parallel blocks honour `maxParallel` (defaults to unlimited) and the worker executes children accordingly, short-circuiting when `continueOnError` is false.
+- Simulation output mirrors approvals/policy gates, allowing the WebService/CLI to show which actions must occur before execution resumes.
+- File-backed state store persists `PackRunState` snapshots (`nextAttemptAt`, attempts, reasons) so orchestration clients and CLI can resume runs deterministically even in air-gapped environments.
+- Step state machine transitions:
+  - `pending → running → succeeded`
+  - `running → failed` (abort) once attempts ≥ `maxAttempts`
+  - `running → pending` with scheduled `nextAttemptAt` when retries remain
+  - `pending → skipped` for disabled steps (e.g., `when` expressions).
+
+## CLI usage
+
+Run the simulation without mutating state:
+
+```bash
+stella task-runner simulate \
+  --manifest ./packs/sample-pack.yaml \
+  --inputs ./inputs.json \
+  --format table
+```
+
+Use `--format json` (or `--output path.json`) to emit the raw payload produced by `POST /api/task-runner/simulations`.
+
+## Follow-up gaps
+
+- Fold the CLI command into the official reference/quickstart guides and capture exit-code conventions.
+
+References:
+
+- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunExecutionGraphBuilder.cs`
+- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/Simulation/PackRunSimulationEngine.cs`
+- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunStepStateMachine.cs`
+- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Infrastructure/Execution/FilePackRunStateStore.cs`
+- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Worker/Services/PackRunWorkerService.cs`
+- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.WebService/Program.cs`