partly or unimplemented features - now implemented
This commit is contained in:
@@ -29,14 +29,93 @@
|
||||
- Circuit breakers automatically pause job types when failure rate > configured threshold; incidents generated via Notify and Observability stack.
|
||||
- Control plane quota updates require Authority scope `orch:quota` (issued via `Orch.Admin` role). Historical rebuilds/backfills additionally require `orch:backfill` and must supply `backfill_reason` and `backfill_ticket` alongside the operator metadata. Authority persists all four fields (`quota_reason`, `quota_ticket`, `backfill_reason`, `backfill_ticket`) for audit replay.
|
||||
|
||||
### 3.1) Quota governance service
|
||||
|
||||
The `QuotaGovernanceService` provides cross-tenant quota allocation with configurable policies:
|
||||
|
||||
**Allocation strategies:**
|
||||
- `Equal` — Divide total capacity equally among all active tenants.
|
||||
- `Proportional` — Allocate based on tenant weight/priority tier.
|
||||
- `Priority` — Higher priority tenants get allocation first, with preemption.
|
||||
- `ReservedWithFairShare` — Reserved minimum per tenant, remainder distributed fairly.
|
||||
- `Fixed` — Static allocation per tenant regardless of demand.
|
||||
|
||||
**Key operations:**
|
||||
- `CalculateAllocationAsync` — Compute quota for a tenant based on active policies.
|
||||
- `RequestQuotaAsync` — Request quota from shared pool; returns granted amount with burst usage.
|
||||
- `ReleaseQuotaAsync` — Return quota to shared pool after job completion.
|
||||
- `CanScheduleAsync` — Check scheduling eligibility combining quota and circuit breaker state.
|
||||
|
||||
**Quota allocation policy properties:**
|
||||
- `TotalCapacity` — Pool size to allocate from (for proportional/fair strategies).
|
||||
- `MinimumPerTenant` / `MaximumPerTenant` — Allocation bounds.
|
||||
- `ReservedCapacity` — Guaranteed capacity for high-priority tenants.
|
||||
- `AllowBurst` / `BurstMultiplier` — Allow temporary overallocation when capacity exists.
|
||||
- `Priority` — Policy evaluation order (higher = first).
|
||||
- `JobType` — Optional job type filter (null = applies to all).
|
||||
|
||||
### 3.2) Circuit breaker service
|
||||
|
||||
The `CircuitBreakerService` implements the circuit breaker pattern for downstream services:
|
||||
|
||||
**States:**
|
||||
- `Closed` — Normal operation; requests pass through. Failures are tracked.
|
||||
- `Open` — Circuit tripped; requests are blocked for `OpenDuration`. Prevents cascade failures.
|
||||
- `HalfOpen` — After open duration, limited test requests allowed. Success → Closed; Failure → Open.
|
||||
|
||||
**Thresholds:**
|
||||
- `FailureThreshold` (0.0–1.0) — Failure rate that triggers circuit open.
|
||||
- `WindowDuration` — Sliding window for failure rate calculation.
|
||||
- `MinimumSamples` — Minimum requests before circuit can trip.
|
||||
- `OpenDuration` — How long circuit stays open before half-open transition.
|
||||
- `HalfOpenTestCount` — Number of test requests allowed in half-open state.
|
||||
|
||||
**Key operations:**
|
||||
- `CheckAsync` — Verify if request is allowed; returns `CircuitBreakerCheckResult`.
|
||||
- `RecordSuccessAsync` / `RecordFailureAsync` — Update circuit state after request.
|
||||
- `ForceOpenAsync` / `ForceCloseAsync` — Manual operator intervention (audited).
|
||||
- `ListAsync` — View all circuit breakers for a tenant with optional state filter.
|
||||
|
||||
**Downstream services protected:**
|
||||
- Scanner
|
||||
- Attestor
|
||||
- Policy Engine
|
||||
- Registry clients
|
||||
- External integrations
|
||||
|
||||
## 4) APIs
|
||||
|
||||
### 4.1) Job management
|
||||
- `GET /api/jobs?status=` — list jobs with filters (tenant, jobType, status, time window).
|
||||
- `GET /api/jobs/{id}` — job detail (payload digest, attempts, worker, lease history, metrics).
|
||||
- `POST /api/jobs/{id}/cancel` — cancel running/pending job with audit reason.
|
||||
- `POST /api/jobs/{id}/replay` — schedule replay.
|
||||
- `POST /api/limits/throttle` — apply throttle (requires elevated scope).
|
||||
- `GET /api/dashboard/metrics` — aggregated metrics for Console dashboards.
|
||||
|
||||
### 4.2) Circuit breaker endpoints (`/api/v1/orchestrator/circuit-breakers`)
|
||||
- `GET /` — List all circuit breakers for tenant (optional `?state=` filter).
|
||||
- `GET /{serviceId}` — Get circuit breaker state for specific downstream service.
|
||||
- `GET /{serviceId}/check` — Check if requests are allowed; returns `IsAllowed`, `State`, `FailureRate`, `TimeUntilRetry`.
|
||||
- `POST /{serviceId}/success` — Record successful request to downstream service.
|
||||
- `POST /{serviceId}/failure` — Record failed request (body: `failureReason`).
|
||||
- `POST /{serviceId}/force-open` — Manually open circuit (body: `reason`; audited).
|
||||
- `POST /{serviceId}/force-close` — Manually close circuit (audited).
|
||||
|
||||
### 4.3) Quota governance endpoints (`/api/v1/orchestrator/quota-governance`)
|
||||
- `GET /policies` — List quota allocation policies (optional `?enabled=` filter).
|
||||
- `GET /policies/{policyId}` — Get specific policy.
|
||||
- `POST /policies` — Create new policy.
|
||||
- `PUT /policies/{policyId}` — Update policy.
|
||||
- `DELETE /policies/{policyId}` — Delete policy.
|
||||
- `GET /allocation` — Calculate allocation for current tenant (optional `?jobType=`).
|
||||
- `POST /request` — Request quota from pool (body: `jobType`, `requestedAmount`).
|
||||
- `POST /release` — Release quota back to pool (body: `jobType`, `releasedAmount`).
|
||||
- `GET /status` — Get tenant quota status (optional `?jobType=`).
|
||||
- `GET /summary` — Get quota governance summary across all tenants (optional `?policyId=`).
|
||||
- `GET /can-schedule` — Check if job can be scheduled (optional `?jobType=`).
|
||||
|
||||
### 4.4) Discovery and documentation
|
||||
- Event envelope draft (`docs/modules/orchestrator/event-envelope.md`) defines notifier/webhook/SSE payloads with idempotency keys, provenance, and task runner metadata for job/pack-run events.
|
||||
- OpenAPI discovery: `/.well-known/openapi` exposes `/openapi/orchestrator.json` (OAS 3.1) with pagination/idempotency/error-envelope examples; legacy job detail/summary endpoints now ship `Deprecation` + `Link` headers that point to their replacements.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user