partly or unimplemented features - now implemented

This commit is contained in:
master
2026-02-09 08:53:51 +02:00
parent 1bf6bbf395
commit 4bdc298ec1
674 changed files with 90194 additions and 2271 deletions

View File

@@ -29,14 +29,93 @@
- Circuit breakers automatically pause job types when failure rate > configured threshold; incidents generated via Notify and Observability stack.
- Control plane quota updates require Authority scope `orch:quota` (issued via `Orch.Admin` role). Historical rebuilds/backfills additionally require `orch:backfill` and must supply `backfill_reason` and `backfill_ticket` alongside the operator metadata. Authority persists all four fields (`quota_reason`, `quota_ticket`, `backfill_reason`, `backfill_ticket`) for audit replay.
### 3.1) Quota governance service
The `QuotaGovernanceService` provides cross-tenant quota allocation with configurable policies:
**Allocation strategies:**
- `Equal` — Divide total capacity equally among all active tenants.
- `Proportional` — Allocate based on tenant weight/priority tier.
- `Priority` — Higher priority tenants get allocation first, with preemption.
- `ReservedWithFairShare` — Reserved minimum per tenant, remainder distributed fairly.
- `Fixed` — Static allocation per tenant regardless of demand.
**Key operations:**
- `CalculateAllocationAsync` — Compute quota for a tenant based on active policies.
- `RequestQuotaAsync` — Request quota from shared pool; returns granted amount with burst usage.
- `ReleaseQuotaAsync` — Return quota to shared pool after job completion.
- `CanScheduleAsync` — Check scheduling eligibility combining quota and circuit breaker state.
**Quota allocation policy properties:**
- `TotalCapacity` — Pool size to allocate from (for proportional/fair strategies).
- `MinimumPerTenant` / `MaximumPerTenant` — Allocation bounds.
- `ReservedCapacity` — Guaranteed capacity for high-priority tenants.
- `AllowBurst` / `BurstMultiplier` — Allow temporary overallocation when capacity exists.
- `Priority` — Policy evaluation order (higher = first).
- `JobType` — Optional job type filter (null = applies to all).
### 3.2) Circuit breaker service
The `CircuitBreakerService` implements the circuit breaker pattern for downstream services:
**States:**
- `Closed` — Normal operation; requests pass through. Failures are tracked.
- `Open` — Circuit tripped; requests are blocked for `OpenDuration`. Prevents cascade failures.
- `HalfOpen` — After open duration, limited test requests allowed. Success → Closed; Failure → Open.
**Thresholds:**
- `FailureThreshold` (0.01.0) — Failure rate that triggers circuit open.
- `WindowDuration` — Sliding window for failure rate calculation.
- `MinimumSamples` — Minimum requests before circuit can trip.
- `OpenDuration` — How long circuit stays open before half-open transition.
- `HalfOpenTestCount` — Number of test requests allowed in half-open state.
**Key operations:**
- `CheckAsync` — Verify if request is allowed; returns `CircuitBreakerCheckResult`.
- `RecordSuccessAsync` / `RecordFailureAsync` — Update circuit state after request.
- `ForceOpenAsync` / `ForceCloseAsync` — Manual operator intervention (audited).
- `ListAsync` — View all circuit breakers for a tenant with optional state filter.
**Downstream services protected:**
- Scanner
- Attestor
- Policy Engine
- Registry clients
- External integrations
## 4) APIs
### 4.1) Job management
- `GET /api/jobs?status=` — list jobs with filters (tenant, jobType, status, time window).
- `GET /api/jobs/{id}` — job detail (payload digest, attempts, worker, lease history, metrics).
- `POST /api/jobs/{id}/cancel` — cancel running/pending job with audit reason.
- `POST /api/jobs/{id}/replay` — schedule replay.
- `POST /api/limits/throttle` — apply throttle (requires elevated scope).
- `GET /api/dashboard/metrics` — aggregated metrics for Console dashboards.
### 4.2) Circuit breaker endpoints (`/api/v1/orchestrator/circuit-breakers`)
- `GET /` — List all circuit breakers for tenant (optional `?state=` filter).
- `GET /{serviceId}` — Get circuit breaker state for specific downstream service.
- `GET /{serviceId}/check` — Check if requests are allowed; returns `IsAllowed`, `State`, `FailureRate`, `TimeUntilRetry`.
- `POST /{serviceId}/success` — Record successful request to downstream service.
- `POST /{serviceId}/failure` — Record failed request (body: `failureReason`).
- `POST /{serviceId}/force-open` — Manually open circuit (body: `reason`; audited).
- `POST /{serviceId}/force-close` — Manually close circuit (audited).
### 4.3) Quota governance endpoints (`/api/v1/orchestrator/quota-governance`)
- `GET /policies` — List quota allocation policies (optional `?enabled=` filter).
- `GET /policies/{policyId}` — Get specific policy.
- `POST /policies` — Create new policy.
- `PUT /policies/{policyId}` — Update policy.
- `DELETE /policies/{policyId}` — Delete policy.
- `GET /allocation` — Calculate allocation for current tenant (optional `?jobType=`).
- `POST /request` — Request quota from pool (body: `jobType`, `requestedAmount`).
- `POST /release` — Release quota back to pool (body: `jobType`, `releasedAmount`).
- `GET /status` — Get tenant quota status (optional `?jobType=`).
- `GET /summary` — Get quota governance summary across all tenants (optional `?policyId=`).
- `GET /can-schedule` — Check if job can be scheduled (optional `?jobType=`).
### 4.4) Discovery and documentation
- Event envelope draft (`docs/modules/orchestrator/event-envelope.md`) defines notifier/webhook/SSE payloads with idempotency keys, provenance, and task runner metadata for job/pack-run events.
- OpenAPI discovery: `/.well-known/openapi` exposes `/openapi/orchestrator.json` (OAS 3.1) with pagination/idempotency/error-envelope examples; legacy job detail/summary endpoints now ship `Deprecation` + `Link` headers that point to their replacements.