Files
git.stella-ops.org/docs/modules/taskrunner/architecture.md
StellaOps Bot 18d87c64c5 feat: add PolicyPackSelectorComponent with tests and integration
- Implemented PolicyPackSelectorComponent for selecting policy packs.
- Added unit tests for component behavior, including API success and error handling.
- Introduced monaco-workers type declarations for editor workers.
- Created acceptance tests for guardrails with stubs for AT1–AT10.
- Established SCA Failure Catalogue Fixtures for regression testing.
- Developed plugin determinism harness with stubs for PL1–PL10.
- Added scripts for evidence upload and verification processes.
2025-12-05 21:24:34 +02:00

8.3 KiB
Raw Blame History

TaskRunner Architecture (v1)

Canonical contract for TaskRunner delivery scoped by SPRINT_0157_0001_0002 (TaskRunner Blockers) and SPRINT_0157_0001_0001 (TaskRunner I). Anchored in product advisory "29-Nov-2025 - Task Pack Orchestration and Automation" and the Task Pack runbook/spec (docs/task-packs/*.md).

1. Purpose and Scope

  • Execute Task Packs deterministically with approvals, sealed-mode enforcement, and evidence capture.
  • Provide API/CLI surface for pack submission, status, logs, approvals, artifacts, and cancellation.
  • Produce provenance: DSSE attestation + evidence bundle for every completed run.
  • Operate offline/air-gapped with plan-hash binding and sealed-mode network allowlists.

2. Components

  • WebService (StellaOps.TaskRunner.WebService) - HTTP API, plan hash validation, SSE log streaming, approval endpoints.
  • Worker (StellaOps.TaskRunner.Worker) - run orchestration, retries/backoff, artifact capture, attestation generation.
  • Core (StellaOps.TaskRunner.Core) - execution graph builder, simulation engine, step state machine, policy/approval gate abstractions.
  • Infrastructure (StellaOps.TaskRunner.Infrastructure) - storage adapters (Mongo, file), artifact/object store clients, evidence bundle writer.

3. Execution Phases

  1. Plan - parse manifest, validate schema, resolve inputs/secrets, build execution graph, compute canonical planHash (SHA-256 over normalised graph).
  2. Simulation (optional) - dry-run graph; emit determinstic preview with approvals/policy gates highlighted.
  3. Execution - verify runtime graph matches planHash; execute steps honoring maxParallel, continueOnError, map/parallel semantics; stream logs/events.
  4. Evidence - capture artifacts + transcripts, emit DSSE attestation binding planHash, inputs/outputs, steps, and timestamps; expose artifact listings via API for post-run retrieval.

4. API Surface (v1)

  • POST /api/runs (packs.run) - submit pack run; requires manifest/version, inputs, tenant context; returns runId + planHash.
  • GET /api/runs/{runId} (packs.read) - run status (graph, attempts, pending gates).
  • GET /api/runs/{runId}/logs (packs.read) - SSE stream of ordered log events.
  • GET /api/runs/{runId}/artifacts (packs.read) - list captured artifacts with digests/paths.
  • POST /api/runs/{runId}/approve (packs.approve) - record approval gate decision (requires Authority token claims pack_run_id, pack_gate_id, pack_plan_hash).
  • POST /api/runs/{runId}/cancel (packs.run) - cancel active run.
  • TODO (Phase II): GET /.well-known/openapi (TASKRUN-OAS-61-002) after OAS publication.

5. Data Model (Mongo, mirrors migration doc)

  • pack_runs: _id, planHash, plan, failurePolicy, requestedAt, createdAt, updatedAt, steps[], tenantId.
  • pack_run_logs: _id, runId, sequence (monotonic), timestamp (UTC), level, eventType, message, stepId?, metadata.
  • pack_artifacts: _id, runId, name, type, sourcePath?, storedPath?, status, notes?, capturedAt.
  • Indexes as defined in docs/modules/taskrunner/migrations/pack-run-collections.md.

6. Step Types and Semantics

  • run module invocation; declares inputs/outputs.
  • parallel executes nested steps[]; honors maxParallel.
  • map expands items into child steps (stepId[index]::templateId).
  • gate.approval human approval checkpoint; enforces TTL/required count; pauses run until satisfied or expired.
  • gate.policy Policy Engine evaluation; failAction decides halt vs. continue.
  • Built-in helper: bundle.ingest (run step) — requires checksum/checksumSha256, validates SHA-256, stages bundles to ArtifactsPath/bundles/{checksum}/{filename} deterministically, and emits metadata.json; fails on missing file or checksum mismatch.

7. Determinism, Air-Gap, and Security

  • Plan hash binding: runtime graph must equal planned graph; mismatch aborts run.
  • All timestamps UTC ISO-8601; ordered logs via (runId, sequence) unique index.
  • Secrets never logged; evidence bundles store only redacted metadata.
  • Sealed mode: reject non-allowlisted network calls; approvals can be processed offline via request/response bundles.
  • RBAC scopes: packs.read, packs.write, packs.run, packs.approve.
  • Approval enforcement: service rejects approval decisions when provided planHash does not match stored run state (protects against stale/forged tokens).

8. Evidence & Attestation

  • DSSE attestation payload (payloadType: application/vnd.stellaops.pack-run+json) includes runId, packName/version, planHash, input/output digests, step statuses, completedAt.
  • Evidence bundle contents: signed manifest, inputs (redacted), outputs, transcripts, DSSE attestation; optional Rekor anchoring when online.

9. Observability (Phase I delivered)

  • Metrics: step latency, retries, queue depth, resource usage (TASKRUN-OBS-50/51-001 DONE).
  • Pending: timeline events (TASKRUN-OBS-52-001), evidence snapshots (TASKRUN-OBS-53-001), attestations (TASKRUN-OBS-54-001), incident mode (TASKRUN-OBS-55-001).

10. Integration Points

  • Authority - approval tokens, scope validation, sealed-vault secrets.
  • Policy Engine - gate.policy decisions, policy context in evidence.
  • Export Center - evidence bundles and manifests for offline/air-gapped export.
  • Orchestrator/CLI - submission + resume flows; SSE log consumption.

11. Configuration (Mongo example)

\"TaskRunner\": {
  \"Storage\": {
    \"Mode\": \"mongo\",
    \"Mongo\": {
      \"ConnectionString\": \"mongodb://127.0.0.1:27017/taskrunner\",
      \"Database\": \"taskrunner\",
      \"RunsCollection\": \"pack_runs\",
      \"LogsCollection\": \"pack_run_logs\",
      \"ArtifactsCollection\": \"pack_artifacts\",
      \"ApprovalsCollection\": \"pack_run_approvals\"
    }
  }
}

12. Gap Remediation (TP1TP10, 2025-12)

  • Canonical plan hash (TP1): Plan hash is sha256:<64-hex> over plan.canonicalPlanPath (normalized JSON, stable key ordering, UTF-8). Hash and canonical plan file are shipped in offline bundles and verified by scripts/packs/verify_offline_bundle.py.
  • Inputs lock (TP2): Task Runner emits inputs.lock capturing resolved inputs + redacted secret placeholders; stored in evidence bundles and listed under hashes[] in offline manifests.
  • Approval ledger (TP3): Approval decisions are DSSE-signed, embedding runId, gateId, planHash, and tenantId. Approval endpoints reject mismatched plan hashes or missing DSSE envelopes.
  • Secret redaction (TP4): Evidence/transcripts apply the redaction policy referenced in security.secretsRedactionPolicy; secrets are hashed or blanked, never logged in clear text.
  • Deterministic ordering/RNG/time (TP5): Execution order derives from the canonical graph, RNG seed is derived from planHash, and all timestamps are UTC ISO-8601 with monotonic log sequences.
  • Sandbox + egress quotas (TP6): Runs declare sandbox.mode (sealed/restricted), explicit egressAllowlist, CPU/memory limits, and optional wall-clock quota. Missing entries cause fail-closed refusal during plan or execution.
  • Registry signing + SBOM + revocation (TP7): Packs accepted by Task Runner must include DSSE envelopes for bundle + attestation, a pack SBOM, and a revocation list path; imports fail when digests or revocation proofs are absent.
  • Offline bundle schema + verifier (TP8): Offline bundles must satisfy docs/task-packs/packs-offline-bundle.schema.json and pass scripts/packs/verify_offline_bundle.py --require-dsse. Evidence locker records the verifier version used.
  • Run/approval SLOs (TP9): Plan validation enforces declared SLOs (runP95Seconds, approvalP95Seconds, maxQueueDepth) and wires alert rules into telemetry (burn-rate alerts on approval latency + queue depth).
  • Fail-closed gates (TP10): Approval/policy/timeline gates default to fail-closed on missing evidence, expired DSSE, or absent quotas; remediation hints surface in pack_run_logs and API error payloads.

13. References

  • Product advisory: docs/product-advisories/29-Nov-2025 - Task Pack Orchestration and Automation.md.
  • Task Pack spec + authoring + runbook: docs/task-packs/spec.md, docs/task-packs/authoring-guide.md, docs/task-packs/runbook.md.
  • Migration detail: docs/modules/taskrunner/migrations/pack-run-collections.md.