StellaOps Source & Job Orchestrator
The Orchestrator schedules, observes, and recovers ingestion and analysis jobs across the StellaOps platform.
Latest updates (2025-11-18)
- Job leasing now flows through the Task Runner bridge: allocations carry idempotency keys, lease durations, and retry hints; workers acknowledge via claim/ack and emit heartbeats.
- Event envelopes remain interim pending ORCH-SVC-37-101; include provenance (tenant/project, job type, correlationId, task runner id) in all notifier events.
- Authority
orch:quota/orch:backfillscopes require reason/ticket audit fields; include them in runbooks and dashboard overrides.
Responsibilities
- Track job state, throughput, and errors for Concelier, Excititor, Scheduler, and export pipelines.
- Expose dashboards and APIs for throttling, replays, and failover.
- Enforce rate-limits, concurrency and dependency chains across queues.
- Stream structured events and audit logs for incident response.
- Provide Task Runner bridge semantics (claim/ack, heartbeats, progress, artifacts, backfills) for Go/Python SDKs.
Key components
- Orchestrator WebService (control plane).
- Queue adapters (Redis/NATS) and job ledger.
- Console dashboard module and CLI integration for operators.
Integrations & dependencies
- Authority for authN/Z on operational actions.
- Telemetry stack for job metrics and alerts.
- Scheduler/Concelier/Excititor workers for job lifecycle.
- Offline Kit for state export/import during air-gap refreshes.
Operational notes
- Job recovery runbooks and dashboard JSON as described in Epic 9.
- Rate-limit and lease reconfiguration guidelines; keep lease defaults aligned across runners and SDKs (Go/Python).
- Log streaming: SSE/WS endpoints carry correlationId + tenant/project; buffer size and retention must be documented in runbooks.
- When using
orch:quota/orch:backfillscopes, capture reason/ticket fields in runbooks and audit checklists.
Epic alignment
- Epic 9: Source & Job Orchestrator Dashboard.
- ORCH stories in ../../TASKS.md.