Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
54 lines
3.2 KiB
Markdown
54 lines
3.2 KiB
Markdown
# Orchestrator Architecture (DOCS-ORCH-32-002)
|
|
|
|
Last updated: 2025-11-25
|
|
|
|
## Runtime components
|
|
- **WebService**: REST + WebSocket API for DAG definitions, run status, and admin actions; issues idempotency tokens and enforces tenant isolation.
|
|
- **Scheduler**: timer/cron runner that instantiates DAG runs from schedules; publishes run intents into per-tenant queues.
|
|
- **Worker**: executes DAG steps; pulls from tenant queues, applies resource limits, and reports spans/metrics/logs.
|
|
- **Plugin host**: task plugins (HTTP call, queue dispatch, CLI tool, script) loaded from signed bundles; execution is sandboxed with deny-by-default network.
|
|
|
|
## Data model
|
|
- **DAG**: directed acyclic graph with topological order; tie-break lexicographically by step id for determinism.
|
|
- **Run**: immutable record with `runId`, `dagVersion`, `tenant`, `inputsHash`, `status`, `traceId`, `startedUtc`, `endedUtc`.
|
|
- **Step execution**: each step captures `inputsHash`, `outputsHash`, `status`, `attempt`, `durationMs`, `logsRef`, `metricsRef`.
|
|
|
|
## Execution flow
|
|
1) Client or scheduler creates a run (idempotent on `runToken`, `dagId`, `inputsHash`).
|
|
2) Scheduler enqueues run intent into tenant queue.
|
|
3) Worker dequeues, reconstructs DAG ordering, and executes steps:
|
|
- skip disabled steps;
|
|
- apply per-step concurrency, retries, and backoff;
|
|
- emit spans/metrics/logs with propagated `traceparent`.
|
|
4) Results are persisted append-only; WebSocket pushes status to clients.
|
|
|
|
## Storage & queues
|
|
- Mongo stores DAG specs, versions, and run history (per-tenant collections or tenant key prefix).
|
|
- Queues: Redis/Mongo-backed FIFO per tenant; message includes `traceparent`, `runToken`, `dagVersion`, `inputsHash`.
|
|
- Artifacts (logs, outputs) referenced by content hash; stored in object storage or Mongo GridFS; hashes recorded in run record.
|
|
|
|
## Security & AOC alignment
|
|
- Mandatory `X-Stella-Tenant`; cross-tenant DAGs prohibited.
|
|
- Scopes: `orchestrator:read|write|admin`; admin needed for DAG publish/delete.
|
|
- AOC: Orchestrator only schedules/executes; no policy/severity decisions. Inputs/outputs immutable; runs replayable.
|
|
- Sandboxing: per-step CPU/memory limits; network egress blocked unless step declares allowlist entry.
|
|
|
|
## Determinism
|
|
- Step ordering: topological + lexical tie-breaks.
|
|
- Idempotency: `runToken` + `inputsHash`; retries reuse same `traceId`; outputs hashed (lowercase hex).
|
|
- Timestamps UTC; NDJSON exports sorted by `(startedUtc, dagId, runId)`.
|
|
|
|
## Offline posture
|
|
- DAG specs and plugins shipped in signed offline bundles; no remote fetch.
|
|
- Transparency: export runs/logs/metrics/traces as NDJSON for air-gapped audit.
|
|
|
|
## Observability
|
|
- Traces: spans named `orchestrator.run`, `orchestrator.step` with attributes `tenant`, `dagId`, `runId`, `stepId`, `status`.
|
|
- Metrics: `orchestrator_runs_total{tenant,status}`, `orchestrator_run_duration_seconds`, `orchestrator_queue_depth`, `orchestrator_step_retries_total`.
|
|
- Logs: structured JSON, redacted, carrying `trace_id`, `tenant`, `dagId`, `runId`, `stepId`.
|
|
|
|
## Governance & rollout
|
|
- DAG publishing requires signature/owner metadata; versions immutable after publish.
|
|
- Rollback: schedule new version and disable old; runs stay immutable.
|
|
- Upgrade path: workers hot-reload plugins from bundle catalog; scheduler is stateless.
|