feat: Enhance Task Runner with simulation and failure policy support
Some checks are pending
Docs CI / lint-and-preview (push) Waiting to run

- Added tests for output projection and failure policy population in TaskPackPlanner.
- Introduced new failure policy manifest in TestManifests.
- Implemented simulation endpoints in the web service for task execution.
- Created TaskRunnerServiceOptions for configuration management.
- Updated appsettings.json to include TaskRunner configuration.
- Enhanced PackRunWorkerService to handle execution graphs and state management.
- Added support for parallel execution and conditional steps in the worker service.
- Updated documentation to reflect new features and changes in execution flow.
This commit is contained in:
master
2025-11-04 19:05:50 +02:00
parent 2eb6852d34
commit 3bd0955202
83 changed files with 15161 additions and 10678 deletions

View File

@@ -130,3 +130,10 @@ All endpoints accept `profile` parameter (default `fips-local`) and return `outp
- Feature flags per tenant (`ai.summary.enabled`, `ai.remediation.enabled`).
- Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
- Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.
## 11) Hosting surfaces
- **WebService** — exposes `/v1/advisory-ai/pipeline/{task}` to materialise plans and enqueue execution messages.
- **Worker** — background service draining the advisory pipeline queue (file-backed stub) pending integration with shared transport.
- Both hosts register `AddAdvisoryAiCore`, which wires the SBOM context client, deterministic toolset, pipeline orchestrator, and queue metrics.
- SBOM base address + tenant metadata are configured via `AdvisoryAI:SbomBaseAddress` and propagated through `AddSbomContext`.

View File

@@ -1,6 +1,6 @@
# Advisory AI Orchestration Pipeline (Planning Notes)
> **Status:** Draft prerequisite design for AIAI-31-004 integration work.
> **Status:** In progress orchestration metadata and cache-key wiring underway for AIAI-31-004.
> **Audience:** Advisory AI guild, WebService/Worker guilds, CLI guild, Docs/QA support teams.
## 1. Goal
@@ -12,9 +12,9 @@ Wire the deterministic pipeline (Summary / Conflict / Remediation flows) into th
| Area | Requirement | Owner | Status |
|------|-------------|-------|--------|
| **Toolset** | Deterministic comparators, dependency analyzer (`IDeterministicToolset`, `AdvisoryPipelineOrchestrator`) | Advisory AI | ✅ landed (AIAI-31-003) |
| **SBOM context** | Real SBOM context client delivering timelines + dependency paths | SBOM Service Guild | ⏳ pending (AIAI-31-002) |
| **SBOM context** | Real SBOM context client delivering timelines + dependency paths | SBOM Service Guild | ✅ typed client and DI helper ready; supply host BaseAddress at integration time |
| **Prompt artifacts** | Liquid/Handlebars prompt templates for summary/conflict/remediation | Advisory AI Docs Guild | ⏳ authoring needed |
| **Cache strategy** | Decision on DSSE or hash-only cache entries, TTLs, and eviction policy | Advisory AI + Platform | 🔲 define |
| **Cache strategy** | Decision on DSSE or hash-only cache entries, TTLs, and eviction policy | Advisory AI + Platform | ⏳ hash-only plan keys implemented; persistence decision outstanding |
| **Auth scopes** | Confirm service account scopes for new API endpoints/worker-to-service calls | Authority Guild | 🔲 define |
**Blocking risk:** SBOM client and prompt templates must exist (even stubbed) before the orchestrator can produce stable plans.
@@ -35,7 +35,7 @@ Wire the deterministic pipeline (Summary / Conflict / Remediation flows) into th
- Expose TTL/force-refresh semantics.
4. **Docs & QA**
- Publish API spec (`docs/advisory-ai/api.md`) + CLI docs.
- Add golden outputs for deterministic runs; property tests for cache key stability.
- Add golden outputs for deterministic runs; property tests for cache key stability (unit coverage landed for cache hashing + option clamps).
## 4. Task Breakdown
@@ -87,4 +87,10 @@ Wire the deterministic pipeline (Summary / Conflict / Remediation flows) into th
2. Create queue schema spec (`docs/modules/advisory-ai/queue-contracts.md`) if not already available.
3. Schedule cross-guild kickoff to agree on cache store & DSSE policy.
_Last updated: 2025-11-02_
## 7. Recent updates
- 2025-11-04 — Orchestrator metadata now captures SBOM environment flags, blast-radius metrics, and dependency analysis details; cache-key normalization covers ordering.
- 2025-11-04 — Unit tests added for SBOM-absent requests, option-limit enforcement, and cache-key stability.
- 2025-11-04 — `AddSbomContext` DI helper enforces tenant header + base address wiring for downstream hosts.
_Last updated: 2025-11-04_

View File

@@ -8,6 +8,7 @@ The DevOps module captures release, deployment, and migration playbooks that kee
- [Architecture](./architecture.md)
- [Implementation plan](./implementation_plan.md)
- [Task board](./TASKS.md)
- [Task Runner simulation notes](./task-runner-simulation.md)
## How to get started
1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.

View File

@@ -1,41 +1,42 @@
# StellaOps DevOps
The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
## Responsibilities
- Maintain CI pipelines, signing workflows, and release packaging steps.
- Operate shared runbooks for launch readiness, upgrades, and NuGet previews.
- Provide offline kit assembly instructions and tooling integration.
- Wrap observability/telemetry bootstrap flows for platform teams.
## Key components
- Runbooks under ./runbooks/ (launch, deployment, nuget).
- Migration guidance under ./migrations/.
- Architecture overview bridging CI/CD & infrastructure concerns.
## Integrations & dependencies
- Ops pipelines (Gitea, GitHub Actions) and artifact registries.
- Authority/Signer for supply chain signing.
- Telemetry stack bootstrap scripts.
## Operational notes
- Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md.
- Dashboards for launch cutover rehearsals.
- Coordination with Security for enforced guardrails.
## Related resources
- ./runbooks/launch-readiness.md
- ./runbooks/launch-cutover.md
- ./runbooks/deployment-upgrade.md
- ./runbooks/nuget-preview-bootstrap.md
- ./migrations/semver-style.md
## Backlog references
- DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md.
- Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`.
## Epic alignment
- **Epic 1 AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines.
- **Epic 9 Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance.
- **Epic 10 Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports.
- **Epic 15 Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation.
# StellaOps DevOps
The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments.
## Responsibilities
- Maintain CI pipelines, signing workflows, and release packaging steps.
- Operate shared runbooks for launch readiness, upgrades, and NuGet previews.
- Provide offline kit assembly instructions and tooling integration.
- Wrap observability/telemetry bootstrap flows for platform teams.
## Key components
- Runbooks under ./runbooks/ (launch, deployment, nuget).
- Migration guidance under ./migrations/.
- Architecture overview bridging CI/CD & infrastructure concerns.
## Integrations & dependencies
- Ops pipelines (Gitea, GitHub Actions) and artifact registries.
- Authority/Signer for supply chain signing.
- Telemetry stack bootstrap scripts.
## Operational notes
- Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md.
- Dashboards for launch cutover rehearsals.
- Coordination with Security for enforced guardrails.
## Related resources
- ./runbooks/launch-readiness.md
- ./runbooks/launch-cutover.md
- ./runbooks/deployment-upgrade.md
- ./runbooks/nuget-preview-bootstrap.md
- ./migrations/semver-style.md
- ./task-runner-simulation.md
## Backlog references
- DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md.
- Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`.
## Epic alignment
- **Epic 1 AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines.
- **Epic 9 Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance.
- **Epic 10 Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports.
- **Epic 15 Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation.

View File

@@ -0,0 +1,48 @@
# Task Runner — Simulation & Failure Policy Notes
> **Status:** Draft (2025-11-04) — execution wiring + CLI simulate command landed; docs pending final polish
The Task Runner planning layer now materialises additional runtime metadata to unblock execution and simulation flows:
- **Execution graph builder** converts `TaskPackPlan` steps (including `map` and `parallel`) into a deterministic graph with preserved enablement flags and per-step metadata (`maxParallel`, `continueOnError`, parameters, approval IDs).
- **Simulation engine** walks the execution graph and classifies steps as `pending`, `skipped`, `requires-approval`, or `requires-policy`, producing a deterministic preview for CLI/UI consumers while surfacing declared outputs.
- **Failure policy** pack-level `spec.failure.retries` is normalised into a `TaskPackPlanFailurePolicy` (default: `maxAttempts = 1`, `backoffSeconds = 0`). The new step state machine uses this policy to schedule retries and to determine when a run must abort.
- **Simulation API + Worker** `POST /v1/task-runner/simulations` returns the deterministic preview; `GET /v1/task-runner/runs/{id}` exposes persisted retry windows now written by the worker as it honours `maxParallel`, `continueOnError`, and retry windows during execution.
## Current behaviour
- Map steps expand into child iterations (`stepId[index]::templateId`) with per-item parameters preserved for runtime reference.
- Parallel blocks honour `maxParallel` (defaults to unlimited) and the worker executes children accordingly, short-circuiting when `continueOnError` is false.
- Simulation output mirrors approvals/policy gates, allowing the WebService/CLI to show which actions must occur before execution resumes.
- File-backed state store persists `PackRunState` snapshots (`nextAttemptAt`, attempts, reasons) so orchestration clients and CLI can resume runs deterministically even in air-gapped environments.
- Step state machine transitions:
- `pending → running → succeeded`
- `running → failed` (abort) once attempts ≥ `maxAttempts`
- `running → pending` with scheduled `nextAttemptAt` when retries remain
- `pending → skipped` for disabled steps (e.g., `when` expressions).
## CLI usage
Run the simulation without mutating state:
```bash
stella task-runner simulate \
--manifest ./packs/sample-pack.yaml \
--inputs ./inputs.json \
--format table
```
Use `--format json` (or `--output path.json`) to emit the raw payload produced by `POST /api/task-runner/simulations`.
## Follow-up gaps
- Fold the CLI command into the official reference/quickstart guides and capture exit-code conventions.
References:
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunExecutionGraphBuilder.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/Simulation/PackRunSimulationEngine.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Core/Execution/PackRunStepStateMachine.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Infrastructure/Execution/FilePackRunStateStore.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.Worker/Services/PackRunWorkerService.cs`
- `src/TaskRunner/StellaOps.TaskRunner/StellaOps.TaskRunner.WebService/Program.cs`