consolidation of some of the modules, localization fixes, product advisories work, qa work

This commit is contained in:
master
2026-03-05 03:54:22 +02:00
parent 7bafcc3eef
commit 8e1cb9448d
3878 changed files with 72600 additions and 46861 deletions

View File

@@ -0,0 +1,78 @@
# SCHED-CONSOLE-27-002 · Policy Simulation Telemetry & Webhooks
> Owners: Scheduler WebService Guild, Observability Guild
> Scope: Policy simulation metrics endpoint and completion webhooks feeding Registry/Console integrations.
## 1. Metrics endpoint refresher
- `GET /api/v1/scheduler/policies/simulations/metrics` (scope: `policy:simulate`)
- Returns queue depth grouped by status plus latency percentiles derived from the most recent sample window (default 200 terminal runs).
- Surface area is unchanged from the implementation in Sprint 27 week 1; consumers should continue to rely on the contract in `samples/api/scheduler/policy-simulation-metrics.json`.
- When backing storage is not PostgreSQL the endpoint responds `501 Not Implemented`.
## 2. Completion webhooks
Scheduler Worker now emits policy simulation webhooks whenever a simulation reaches a terminal state (`succeeded`, `failed`, `cancelled`). Payloads are aligned with the SSE `completed` event shape and include idempotency headers so downstream systems can safely de-duplicate.
### 2.1 Configuration
```jsonc
// scheduler-worker.appsettings.json
{
"Scheduler": {
"Worker": {
"Policy": {
"Webhook": {
"Enabled": true,
"Endpoint": "https://registry.internal/hooks/policy-simulation",
"ApiKeyHeader": "X-StellaOps-Webhook-Key",
"ApiKey": "replace-me",
"TimeoutSeconds": 10
}
}
}
}
}
```
- `Enabled`: feature flag; disabled by default to preserve air-gap behaviour.
- `Endpoint`: absolute HTTPS endpoint; requests use `POST`.
- `ApiKeyHeader`/`ApiKey`: optional bearer for Registry verification.
- `TimeoutSeconds`: per-request timeout (defaults to 10s).
### 2.2 Headers
| Header | Purpose |
|------------------------|---------------------------------------|
| `X-StellaOps-Tenant` | Tenant identifier for the simulation. |
| `X-StellaOps-Run-Id` | Stable run id (use as idempotency key). |
| `X-StellaOps-Webhook-Key` | Optional API key as configured. |
### 2.3 Payload
See `samples/api/scheduler/policy-simulation-webhook.json` for a canonical example.
```json
{
"tenantId": "tenant-alpha",
"simulation": { /* PolicyRunStatus document */ },
"result": "failed",
"observedAt": "2025-11-03T20:05:12Z",
"latencySeconds": 14.287,
"reason": "policy engine timeout"
}
```
- `result`: `succeeded`, `failed`, `cancelled`, `running`, or `queued`. Terminal webhooks are emitted only for the first three.
- `latencySeconds`: bounded to four decimal places; derived from `finishedAt - queuedAt` when timestamps exist, else falls back to observer timestamp.
- `reason`: surfaced for failures (`error`) and cancellations (`cancellationReason`); omitted otherwise.
### 2.4 Delivery semantics
- Best effort with no retry from the worker — Registry should use `X-StellaOps-Run-Id` for idempotency.
- Failures emit WARN logs (prefix `Policy run job {JobId}`).
- Disabled configuration short-circuits without network calls (debug log only).
## 3. SSE compatibility
No changes were required on the streaming endpoint (`GET /api/v1/scheduler/policies/simulations/{id}/stream`); Console continues to receive `completed` events containing the same `PolicyRunStatus` payload that the webhook publishes.