Files
git.stella-ops.org/docs/modules/taskrunner/architecture.md
StellaOps Bot 53508ceccb
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests and logging infrastructure for InMemory and RabbitMQ transports
- Implemented RecordingLogger and RecordingLoggerFactory for capturing log entries in tests.
- Added unit tests for InMemoryChannel, covering constructor behavior, property assignments, channel communication, and disposal.
- Created InMemoryTransportOptionsTests to validate default values and customizable options for InMemory transport.
- Developed RabbitMqFrameProtocolTests to ensure correct parsing and property creation for RabbitMQ frames.
- Added RabbitMqTransportOptionsTests to verify default settings and customization options for RabbitMQ transport.
- Updated project files for testing libraries and dependencies.
2025-12-05 09:38:45 +02:00

101 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TaskRunner Architecture (v1)
> Canonical contract for TaskRunner delivery scoped by SPRINT_0157_0001_0002 (TaskRunner Blockers) and SPRINT_0157_0001_0001 (TaskRunner I). Anchored in product advisory **"29-Nov-2025 - Task Pack Orchestration and Automation"** and the Task Pack runbook/spec (`docs/task-packs/*.md`).
## 1. Purpose and Scope
- Execute Task Packs deterministically with approvals, sealed-mode enforcement, and evidence capture.
- Provide API/CLI surface for pack submission, status, logs, approvals, artifacts, and cancellation.
- Produce provenance: DSSE attestation + evidence bundle for every completed run.
- Operate offline/air-gapped with plan-hash binding and sealed-mode network allowlists.
## 2. Components
- **WebService** (`StellaOps.TaskRunner.WebService`) - HTTP API, plan hash validation, SSE log streaming, approval endpoints.
- **Worker** (`StellaOps.TaskRunner.Worker`) - run orchestration, retries/backoff, artifact capture, attestation generation.
- **Core** (`StellaOps.TaskRunner.Core`) - execution graph builder, simulation engine, step state machine, policy/approval gate abstractions.
- **Infrastructure** (`StellaOps.TaskRunner.Infrastructure`) - storage adapters (Mongo, file), artifact/object store clients, evidence bundle writer.
## 3. Execution Phases
1. **Plan** - parse manifest, validate schema, resolve inputs/secrets, build execution graph, compute canonical `planHash` (SHA-256 over normalised graph).
2. **Simulation (optional)** - dry-run graph; emit determinstic preview with approvals/policy gates highlighted.
3. **Execution** - verify runtime graph matches `planHash`; execute steps honoring `maxParallel`, `continueOnError`, `map`/`parallel` semantics; stream logs/events.
4. **Evidence** - capture artifacts + transcripts, emit DSSE attestation binding `planHash`, inputs/outputs, steps, and timestamps; expose artifact listings via API for post-run retrieval.
## 4. API Surface (v1)
- `POST /api/runs` (`packs.run`) - submit pack run; requires manifest/version, inputs, tenant context; returns `runId` + `planHash`.
- `GET /api/runs/{runId}` (`packs.read`) - run status (graph, attempts, pending gates).
- `GET /api/runs/{runId}/logs` (`packs.read`) - SSE stream of ordered log events.
- `GET /api/runs/{runId}/artifacts` (`packs.read`) - list captured artifacts with digests/paths.
- `POST /api/runs/{runId}/approve` (`packs.approve`) - record approval gate decision (requires Authority token claims `pack_run_id`, `pack_gate_id`, `pack_plan_hash`).
- `POST /api/runs/{runId}/cancel` (`packs.run`) - cancel active run.
- TODO (Phase II): `GET /.well-known/openapi` (TASKRUN-OAS-61-002) after OAS publication.
## 5. Data Model (Mongo, mirrors migration doc)
- **pack_runs**: `_id`, `planHash`, `plan`, `failurePolicy`, `requestedAt`, `createdAt`, `updatedAt`, `steps[]`, `tenantId`.
- **pack_run_logs**: `_id`, `runId`, `sequence` (monotonic), `timestamp` (UTC), `level`, `eventType`, `message`, `stepId?`, `metadata`.
- **pack_artifacts**: `_id`, `runId`, `name`, `type`, `sourcePath?`, `storedPath?`, `status`, `notes?`, `capturedAt`.
- Indexes as defined in `docs/modules/taskrunner/migrations/pack-run-collections.md`.
## 6. Step Types and Semantics
- `run` module invocation; declares `inputs`/`outputs`.
- `parallel` executes nested `steps[]`; honors `maxParallel`.
- `map` expands items into child steps (`stepId[index]::templateId`).
- `gate.approval` human approval checkpoint; enforces TTL/required count; pauses run until satisfied or expired.
- `gate.policy` Policy Engine evaluation; `failAction` decides halt vs. continue.
- Built-in helper: `bundle.ingest` (run step) — requires `checksum`/`checksumSha256`, validates SHA-256, stages bundles to `ArtifactsPath/bundles/{checksum}/{filename}` deterministically, and emits `metadata.json`; fails on missing file or checksum mismatch.
## 7. Determinism, Air-Gap, and Security
- Plan hash binding: runtime graph must equal planned graph; mismatch aborts run.
- All timestamps UTC ISO-8601; ordered logs via `(runId, sequence)` unique index.
- Secrets never logged; evidence bundles store only redacted metadata.
- Sealed mode: reject non-allowlisted network calls; approvals can be processed offline via request/response bundles.
- RBAC scopes: `packs.read`, `packs.write`, `packs.run`, `packs.approve`.
- Approval enforcement: service rejects approval decisions when provided `planHash` does not match stored run state (protects against stale/forged tokens).
## 8. Evidence & Attestation
- DSSE attestation payload (`payloadType`: `application/vnd.stellaops.pack-run+json`) includes `runId`, `packName/version`, `planHash`, input/output digests, step statuses, `completedAt`.
- Evidence bundle contents: signed manifest, inputs (redacted), outputs, transcripts, DSSE attestation; optional Rekor anchoring when online.
## 9. Observability (Phase I delivered)
- Metrics: step latency, retries, queue depth, resource usage (`TASKRUN-OBS-50/51-001` DONE).
- Pending: timeline events (`TASKRUN-OBS-52-001`), evidence snapshots (`TASKRUN-OBS-53-001`), attestations (`TASKRUN-OBS-54-001`), incident mode (`TASKRUN-OBS-55-001`).
## 10. Integration Points
- **Authority** - approval tokens, scope validation, sealed-vault secrets.
- **Policy Engine** - `gate.policy` decisions, policy context in evidence.
- **Export Center** - evidence bundles and manifests for offline/air-gapped export.
- **Orchestrator/CLI** - submission + resume flows; SSE log consumption.
## 11. Configuration (Mongo example)
```json
\"TaskRunner\": {
\"Storage\": {
\"Mode\": \"mongo\",
\"Mongo\": {
\"ConnectionString\": \"mongodb://127.0.0.1:27017/taskrunner\",
\"Database\": \"taskrunner\",
\"RunsCollection\": \"pack_runs\",
\"LogsCollection\": \"pack_run_logs\",
\"ArtifactsCollection\": \"pack_artifacts\",
\"ApprovalsCollection\": \"pack_run_approvals\"
}
}
}
```
## 12. Gap Remediation (TP1TP10, 2025-12)
- **Canonical plan hash (TP1):** Plan hash is `sha256` over `plan.canonicalPlanPath` (normalized JSON, stable key ordering, UTF-8). Hash and canonical plan file are shipped in offline bundles and verified by `scripts/packs/verify_offline_bundle.py`.
- **Inputs lock (TP2):** Task Runner emits `inputs.lock` capturing resolved inputs + redacted secret placeholders; stored in evidence bundles and listed under `hashes[]` in offline manifests.
- **Approval ledger (TP3):** Approval decisions are DSSE-signed, embedding `runId`, `gateId`, `planHash`, and `tenantId`. Approval endpoints reject mismatched plan hashes or missing DSSE envelopes.
- **Secret redaction (TP4):** Evidence/transcripts apply the redaction policy referenced in `security.secretsRedactionPolicy`; secrets are hashed or blanked, never logged in clear text.
- **Deterministic ordering/RNG/time (TP5):** Execution order derives from the canonical graph, RNG seed is derived from `planHash`, and all timestamps are UTC ISO-8601 with monotonic log sequences.
- **Sandbox + egress quotas (TP6):** Runs declare `sandbox.mode` (`sealed`/`restricted`), explicit `egressAllowlist`, CPU/memory limits, and optional wall-clock quota. Missing entries cause fail-closed refusal during plan or execution.
- **Registry signing + SBOM + revocation (TP7):** Packs accepted by Task Runner must include DSSE envelopes for bundle + attestation, a pack SBOM, and a revocation list path; imports fail when digests or revocation proofs are absent.
- **Offline bundle schema + verifier (TP8):** Offline bundles must satisfy `docs/task-packs/packs-offline-bundle.schema.json` and pass `scripts/packs/verify_offline_bundle.py --require-dsse`. Evidence locker records the verifier version used.
- **Run/approval SLOs (TP9):** Plan validation enforces declared SLOs (`runP95Seconds`, `approvalP95Seconds`, `maxQueueDepth`) and wires alert rules into telemetry (burn-rate alerts on approval latency + queue depth).
- **Fail-closed gates (TP10):** Approval/policy/timeline gates default to fail-closed on missing evidence, expired DSSE, or absent quotas; remediation hints surface in `pack_run_logs` and API error payloads.
## 13. References
- Product advisory: `docs/product-advisories/29-Nov-2025 - Task Pack Orchestration and Automation.md`.
- Task Pack spec + authoring + runbook: `docs/task-packs/spec.md`, `docs/task-packs/authoring-guide.md`, `docs/task-packs/runbook.md`.
- Migration detail: `docs/modules/taskrunner/migrations/pack-run-collections.md`.