2.6 KiB
2.6 KiB
Orchestrator Overview (DOCS-ORCH-32-001)
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Last updated: 2025-11-25
Mission & value
- Coordinate deterministic job execution across StellaOps services (Policy, RiskEngine, VEX Lens, Export Center, Notify).
- Provide reproducible DAG runs with tenant isolation, auditability, and Aggregation-Only Contract (AOC) alignment.
- Stay sovereign/offline: all runners operate from bundled manifests and local queues; no external control plane.
Runtime shape
- Services: Orchestrator WebService (API/UI), Worker (executors), Scheduler (timer-based triggers).
- Queues: per-tenant work queues; FIFO with deterministic ordering and idempotency keys.
- State: PostgreSQL for run metadata and DAG definitions; optional Valkey for locks/throttles; all scoped by tenant.
- APIs: REST + WebSocket for run status/stream; admin endpoints require
orchestrator:adminplus tenant header.
AOC alignment
- Orchestrator never derives policy/verdicts; it only executes declared DAG steps and records outcomes.
- Inputs/outputs are append-only; runs are immutable with replay tokens.
- No consensus logic; all decisions remain in owning services (Policy Engine, RiskEngine, etc.).
Determinism
- Stable DAG evaluation order (topological with lexical tie-breaks).
- Idempotency via run tokens and step hashes; retries preserve
trace_id. - UTC timestamps; hashes lowercase hex; NDJSON exports ordered by
(timestamp, dagId, runId).
Observability
- Traces propagate
traceparent/baggagethrough scheduler→worker→task. - Metrics:
orchestrator_runs_total{tenant,status},orchestrator_run_duration_seconds,orchestrator_queue_depth. - Logs: structured JSON, redacted, tagged with
tenant,dagId,runId,status.
Roles & responsibilities
- Operator: manage DAG definitions, quotas, tenant allowlists, SLOs.
- Developer: defines DAG specs and task plugins; supplies offline bundles for execution.
- Security: validates scopes, enforces AOC boundaries, reviews audit trails.
Offline posture
- DAG specs and plugins shipped in offline bundles; runners load from local disk.
- No outbound network during execution unless task explicitly declares an allowlisted endpoint.
- Transparency: export run logs/traces/metrics as NDJSON for air-gapped review.
Safety & governance
- Mandatory tenant header; cross-tenant DAGs forbidden.
- Step sandboxing: resource limits per task; deny network by default.
- Audit: every run records actor, tenant, DAG version, inputs hash, outputs hash, and rationale notes.