docs consolidation work
This commit is contained in:
@@ -33,6 +33,46 @@ The Orchestrator schedules, observes, and recovers ingestion and analysis jobs a
|
||||
- Log streaming: SSE/WS endpoints carry correlationId + tenant/project; buffer size and retention must be documented in runbooks.
|
||||
- When using `orch:quota` / `orch:backfill` scopes, capture reason/ticket fields in runbooks and audit checklists.
|
||||
|
||||
## Implementation Status
|
||||
|
||||
### Phase 1 – Core service & job ledger (Complete)
|
||||
- PostgreSQL schema with sources, runs, jobs, artifacts, DAG edges, quotas, schedules, incidents
|
||||
- Lease manager with heartbeats, retries, dead-letter queues
|
||||
- Token-bucket rate limiter per tenant/source.host with adaptive refill
|
||||
- Watermark/backfill orchestration for event-time windows
|
||||
|
||||
### Phase 2 – Worker SDK & artifact registry (Complete)
|
||||
- Claim/heartbeat/report contract with deterministic artifact hashing
|
||||
- Idempotency enforcement and worker SDKs for .NET/Rust/Go agents
|
||||
- Integrated with Concelier, Excititor, SBOM Service, Policy Engine
|
||||
|
||||
### Phase 3 – Observability & dashboard (In Progress)
|
||||
- Metrics: queue depth, job latency, failure classes, rate-limit hits, burn rate
|
||||
- Error clustering for HTTP 429/5xx, schema mismatches, parse errors
|
||||
- SSE/WebSocket feeds for Console updates, Gantt timeline/DAG JSON
|
||||
|
||||
### Phase 4 – Controls & resilience (Planned)
|
||||
- Pause/resume/throttle/retry/backfill tooling
|
||||
- Dead-letter review, circuit breakers, blackouts, backpressure handling
|
||||
- Automation hooks and control plane APIs
|
||||
|
||||
### Phase 5 – Offline & compliance (Planned)
|
||||
- Deterministic audit bundles (jobs.jsonl, history.jsonl, throttles.jsonl)
|
||||
- Provenance manifests and offline replay scripts
|
||||
- Tenant isolation validation and secret redaction
|
||||
|
||||
### Key Acceptance Criteria
|
||||
- Schedules all jobs with quotas, rate limits, idempotency; preserves provenance
|
||||
- Console reflects real-time DAG status, queue depth, SLO burn rate
|
||||
- Observability stack exposes metrics, logs, traces, incidents for stuck jobs and throttling
|
||||
- Offline audit bundles reproduce job history deterministically with verified signatures
|
||||
|
||||
### Technical Decisions & Risks
|
||||
- Backpressure/queue overload mitigated via adaptive token buckets, circuit breakers, dynamic concurrency
|
||||
- Upstream vendor throttles managed with visible state, automatic jitter and retry
|
||||
- Tenant leakage prevented through API/queue/storage filters, fuzz tests, redaction
|
||||
- Complex DAG errors handled with diagnostics, error clustering, partial replay tooling
|
||||
|
||||
## Epic alignment
|
||||
- Epic 9: Source & Job Orchestrator Dashboard.
|
||||
- ORCH stories in ../../TASKS.md.
|
||||
|
||||
Reference in New Issue
Block a user