Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
47 lines
2.6 KiB
Markdown
47 lines
2.6 KiB
Markdown
# Orchestrator Overview (DOCS-ORCH-32-001)
|
|
|
|
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
|
|
|
Last updated: 2025-11-25
|
|
|
|
## Mission & value
|
|
- Coordinate deterministic job execution across StellaOps services (Policy, RiskEngine, VEX Lens, Export Center, Notify).
|
|
- Provide reproducible DAG runs with tenant isolation, auditability, and Aggregation-Only Contract (AOC) alignment.
|
|
- Stay sovereign/offline: all runners operate from bundled manifests and local queues; no external control plane.
|
|
|
|
## Runtime shape
|
|
- **Services**: Orchestrator WebService (API/UI), Worker (executors), Scheduler (timer-based triggers).
|
|
- **Queues**: per-tenant work queues; FIFO with deterministic ordering and idempotency keys.
|
|
- **State**: Mongo for run metadata and DAG definitions; optional Redis for locks/throttles; all scoped by tenant.
|
|
- **APIs**: REST + WebSocket for run status/stream; admin endpoints require `orchestrator:admin` plus tenant header.
|
|
|
|
## AOC alignment
|
|
- Orchestrator never derives policy/verdicts; it only executes declared DAG steps and records outcomes.
|
|
- Inputs/outputs are append-only; runs are immutable with replay tokens.
|
|
- No consensus logic; all decisions remain in owning services (Policy Engine, RiskEngine, etc.).
|
|
|
|
## Determinism
|
|
- Stable DAG evaluation order (topological with lexical tie-breaks).
|
|
- Idempotency via run tokens and step hashes; retries preserve `trace_id`.
|
|
- UTC timestamps; hashes lowercase hex; NDJSON exports ordered by `(timestamp, dagId, runId)`.
|
|
|
|
## Observability
|
|
- Traces propagate `traceparent`/`baggage` through scheduler→worker→task.
|
|
- Metrics: `orchestrator_runs_total{tenant,status}`, `orchestrator_run_duration_seconds`, `orchestrator_queue_depth`.
|
|
- Logs: structured JSON, redacted, tagged with `tenant`, `dagId`, `runId`, `status`.
|
|
|
|
## Roles & responsibilities
|
|
- **Operator**: manage DAG definitions, quotas, tenant allowlists, SLOs.
|
|
- **Developer**: defines DAG specs and task plugins; supplies offline bundles for execution.
|
|
- **Security**: validates scopes, enforces AOC boundaries, reviews audit trails.
|
|
|
|
## Offline posture
|
|
- DAG specs and plugins shipped in offline bundles; runners load from local disk.
|
|
- No outbound network during execution unless task explicitly declares an allowlisted endpoint.
|
|
- Transparency: export run logs/traces/metrics as NDJSON for air-gapped review.
|
|
|
|
## Safety & governance
|
|
- Mandatory tenant header; cross-tenant DAGs forbidden.
|
|
- Step sandboxing: resource limits per task; deny network by default.
|
|
- Audit: every run records actor, tenant, DAG version, inputs hash, outputs hash, and rationale notes.
|