# Orchestrator runbook Pre-flight - Verify database and queue backends are healthy. - Confirm tenant allowlist and orchestrator scopes in Authority. - Ensure plugin bundles are present and signatures verified. Common operations - Start a run via API or CLI. - Cancel runs with idempotent requests. - Stream status via WebSocket or CLI. - Export run ledger as NDJSON for audit. Incident response - Queue backlog: scale workers and drain oldest first. - Repeated failures: inspect error codes and inputsHash; roll back DAG version. - Plugin auth errors: rotate secrets and warm caches. Health checks - /admin/health for liveness and queue depth. - Metrics: orchestrator_runs_total, orchestrator_queue_depth, orchestrator_step_retries_total, orchestrator_run_duration_seconds. - Logs include tenant, dagId, runId, status with redaction. Determinism and immutability - Runs are append-only; never mutate ledger entries. - Use runToken for idempotent retries. Offline posture - Keep DAG specs and plugins in sealed storage. - Export logs, metrics, and traces as NDJSON. Related references - orchestrator/overview.md - orchestrator/architecture.md - docs/operations/orchestrator-runbook.md