Files
git.stella-ops.org/docs/modules/taskrunner/architecture.md
StellaOps Bot 18d87c64c5 feat: add PolicyPackSelectorComponent with tests and integration
- Implemented PolicyPackSelectorComponent for selecting policy packs.
- Added unit tests for component behavior, including API success and error handling.
- Introduced monaco-workers type declarations for editor workers.
- Created acceptance tests for guardrails with stubs for AT1–AT10.
- Established SCA Failure Catalogue Fixtures for regression testing.
- Developed plugin determinism harness with stubs for PL1–PL10.
- Added scripts for evidence upload and verification processes.
2025-12-05 21:24:34 +02:00

101 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TaskRunner Architecture (v1)
> Canonical contract for TaskRunner delivery scoped by SPRINT_0157_0001_0002 (TaskRunner Blockers) and SPRINT_0157_0001_0001 (TaskRunner I). Anchored in product advisory **"29-Nov-2025 - Task Pack Orchestration and Automation"** and the Task Pack runbook/spec (`docs/task-packs/*.md`).
## 1. Purpose and Scope
- Execute Task Packs deterministically with approvals, sealed-mode enforcement, and evidence capture.
- Provide API/CLI surface for pack submission, status, logs, approvals, artifacts, and cancellation.
- Produce provenance: DSSE attestation + evidence bundle for every completed run.
- Operate offline/air-gapped with plan-hash binding and sealed-mode network allowlists.
## 2. Components
- **WebService** (`StellaOps.TaskRunner.WebService`) - HTTP API, plan hash validation, SSE log streaming, approval endpoints.
- **Worker** (`StellaOps.TaskRunner.Worker`) - run orchestration, retries/backoff, artifact capture, attestation generation.
- **Core** (`StellaOps.TaskRunner.Core`) - execution graph builder, simulation engine, step state machine, policy/approval gate abstractions.
- **Infrastructure** (`StellaOps.TaskRunner.Infrastructure`) - storage adapters (Mongo, file), artifact/object store clients, evidence bundle writer.
## 3. Execution Phases
1. **Plan** - parse manifest, validate schema, resolve inputs/secrets, build execution graph, compute canonical `planHash` (SHA-256 over normalised graph).
2. **Simulation (optional)** - dry-run graph; emit determinstic preview with approvals/policy gates highlighted.
3. **Execution** - verify runtime graph matches `planHash`; execute steps honoring `maxParallel`, `continueOnError`, `map`/`parallel` semantics; stream logs/events.
4. **Evidence** - capture artifacts + transcripts, emit DSSE attestation binding `planHash`, inputs/outputs, steps, and timestamps; expose artifact listings via API for post-run retrieval.
## 4. API Surface (v1)
- `POST /api/runs` (`packs.run`) - submit pack run; requires manifest/version, inputs, tenant context; returns `runId` + `planHash`.
- `GET /api/runs/{runId}` (`packs.read`) - run status (graph, attempts, pending gates).
- `GET /api/runs/{runId}/logs` (`packs.read`) - SSE stream of ordered log events.
- `GET /api/runs/{runId}/artifacts` (`packs.read`) - list captured artifacts with digests/paths.
- `POST /api/runs/{runId}/approve` (`packs.approve`) - record approval gate decision (requires Authority token claims `pack_run_id`, `pack_gate_id`, `pack_plan_hash`).
- `POST /api/runs/{runId}/cancel` (`packs.run`) - cancel active run.
- TODO (Phase II): `GET /.well-known/openapi` (TASKRUN-OAS-61-002) after OAS publication.
## 5. Data Model (Mongo, mirrors migration doc)
- **pack_runs**: `_id`, `planHash`, `plan`, `failurePolicy`, `requestedAt`, `createdAt`, `updatedAt`, `steps[]`, `tenantId`.
- **pack_run_logs**: `_id`, `runId`, `sequence` (monotonic), `timestamp` (UTC), `level`, `eventType`, `message`, `stepId?`, `metadata`.
- **pack_artifacts**: `_id`, `runId`, `name`, `type`, `sourcePath?`, `storedPath?`, `status`, `notes?`, `capturedAt`.
- Indexes as defined in `docs/modules/taskrunner/migrations/pack-run-collections.md`.
## 6. Step Types and Semantics
- `run` module invocation; declares `inputs`/`outputs`.
- `parallel` executes nested `steps[]`; honors `maxParallel`.
- `map` expands items into child steps (`stepId[index]::templateId`).
- `gate.approval` human approval checkpoint; enforces TTL/required count; pauses run until satisfied or expired.
- `gate.policy` Policy Engine evaluation; `failAction` decides halt vs. continue.
- Built-in helper: `bundle.ingest` (run step) — requires `checksum`/`checksumSha256`, validates SHA-256, stages bundles to `ArtifactsPath/bundles/{checksum}/{filename}` deterministically, and emits `metadata.json`; fails on missing file or checksum mismatch.
## 7. Determinism, Air-Gap, and Security
- Plan hash binding: runtime graph must equal planned graph; mismatch aborts run.
- All timestamps UTC ISO-8601; ordered logs via `(runId, sequence)` unique index.
- Secrets never logged; evidence bundles store only redacted metadata.
- Sealed mode: reject non-allowlisted network calls; approvals can be processed offline via request/response bundles.
- RBAC scopes: `packs.read`, `packs.write`, `packs.run`, `packs.approve`.
- Approval enforcement: service rejects approval decisions when provided `planHash` does not match stored run state (protects against stale/forged tokens).
## 8. Evidence & Attestation
- DSSE attestation payload (`payloadType`: `application/vnd.stellaops.pack-run+json`) includes `runId`, `packName/version`, `planHash`, input/output digests, step statuses, `completedAt`.
- Evidence bundle contents: signed manifest, inputs (redacted), outputs, transcripts, DSSE attestation; optional Rekor anchoring when online.
## 9. Observability (Phase I delivered)
- Metrics: step latency, retries, queue depth, resource usage (`TASKRUN-OBS-50/51-001` DONE).
- Pending: timeline events (`TASKRUN-OBS-52-001`), evidence snapshots (`TASKRUN-OBS-53-001`), attestations (`TASKRUN-OBS-54-001`), incident mode (`TASKRUN-OBS-55-001`).
## 10. Integration Points
- **Authority** - approval tokens, scope validation, sealed-vault secrets.
- **Policy Engine** - `gate.policy` decisions, policy context in evidence.
- **Export Center** - evidence bundles and manifests for offline/air-gapped export.
- **Orchestrator/CLI** - submission + resume flows; SSE log consumption.
## 11. Configuration (Mongo example)
```json
\"TaskRunner\": {
\"Storage\": {
\"Mode\": \"mongo\",
\"Mongo\": {
\"ConnectionString\": \"mongodb://127.0.0.1:27017/taskrunner\",
\"Database\": \"taskrunner\",
\"RunsCollection\": \"pack_runs\",
\"LogsCollection\": \"pack_run_logs\",
\"ArtifactsCollection\": \"pack_artifacts\",
\"ApprovalsCollection\": \"pack_run_approvals\"
}
}
}
```
## 12. Gap Remediation (TP1TP10, 2025-12)
- **Canonical plan hash (TP1):** Plan hash is `sha256:<64-hex>` over `plan.canonicalPlanPath` (normalized JSON, stable key ordering, UTF-8). Hash and canonical plan file are shipped in offline bundles and verified by `scripts/packs/verify_offline_bundle.py`.
- **Inputs lock (TP2):** Task Runner emits `inputs.lock` capturing resolved inputs + redacted secret placeholders; stored in evidence bundles and listed under `hashes[]` in offline manifests.
- **Approval ledger (TP3):** Approval decisions are DSSE-signed, embedding `runId`, `gateId`, `planHash`, and `tenantId`. Approval endpoints reject mismatched plan hashes or missing DSSE envelopes.
- **Secret redaction (TP4):** Evidence/transcripts apply the redaction policy referenced in `security.secretsRedactionPolicy`; secrets are hashed or blanked, never logged in clear text.
- **Deterministic ordering/RNG/time (TP5):** Execution order derives from the canonical graph, RNG seed is derived from `planHash`, and all timestamps are UTC ISO-8601 with monotonic log sequences.
- **Sandbox + egress quotas (TP6):** Runs declare `sandbox.mode` (`sealed`/`restricted`), explicit `egressAllowlist`, CPU/memory limits, and optional wall-clock quota. Missing entries cause fail-closed refusal during plan or execution.
- **Registry signing + SBOM + revocation (TP7):** Packs accepted by Task Runner must include DSSE envelopes for bundle + attestation, a pack SBOM, and a revocation list path; imports fail when digests or revocation proofs are absent.
- **Offline bundle schema + verifier (TP8):** Offline bundles must satisfy `docs/task-packs/packs-offline-bundle.schema.json` and pass `scripts/packs/verify_offline_bundle.py --require-dsse`. Evidence locker records the verifier version used.
- **Run/approval SLOs (TP9):** Plan validation enforces declared SLOs (`runP95Seconds`, `approvalP95Seconds`, `maxQueueDepth`) and wires alert rules into telemetry (burn-rate alerts on approval latency + queue depth).
- **Fail-closed gates (TP10):** Approval/policy/timeline gates default to fail-closed on missing evidence, expired DSSE, or absent quotas; remediation hints surface in `pack_run_logs` and API error payloads.
## 13. References
- Product advisory: `docs/product-advisories/29-Nov-2025 - Task Pack Orchestration and Automation.md`.
- Task Pack spec + authoring + runbook: `docs/task-packs/spec.md`, `docs/task-packs/authoring-guide.md`, `docs/task-packs/runbook.md`.
- Migration detail: `docs/modules/taskrunner/migrations/pack-run-collections.md`.