# SCHED-CONSOLE-27-002 · Policy Simulation Telemetry & Webhooks > Owners: Scheduler WebService Guild, Observability Guild > Scope: Policy simulation metrics endpoint and completion webhooks feeding Registry/Console integrations. ## 1. Metrics endpoint refresher - `GET /api/v1/scheduler/policies/simulations/metrics` (scope: `policy:simulate`) - Returns queue depth grouped by status plus latency percentiles derived from the most recent sample window (default 200 terminal runs). - Surface area is unchanged from the implementation in Sprint 27 week 1; consumers should continue to rely on the contract in `samples/api/scheduler/policy-simulation-metrics.json`. - When backing storage is not PostgreSQL the endpoint responds `501 Not Implemented`. ## 2. Completion webhooks Scheduler Worker now emits policy simulation webhooks whenever a simulation reaches a terminal state (`succeeded`, `failed`, `cancelled`). Payloads are aligned with the SSE `completed` event shape and include idempotency headers so downstream systems can safely de-duplicate. ### 2.1 Configuration ```jsonc // scheduler-worker.appsettings.json { "Scheduler": { "Worker": { "Policy": { "Webhook": { "Enabled": true, "Endpoint": "https://registry.internal/hooks/policy-simulation", "ApiKeyHeader": "X-StellaOps-Webhook-Key", "ApiKey": "replace-me", "TimeoutSeconds": 10 } } } } } ``` - `Enabled`: feature flag; disabled by default to preserve air-gap behaviour. - `Endpoint`: absolute HTTPS endpoint; requests use `POST`. - `ApiKeyHeader`/`ApiKey`: optional bearer for Registry verification. - `TimeoutSeconds`: per-request timeout (defaults to 10s). ### 2.2 Headers | Header | Purpose | |------------------------|---------------------------------------| | `X-StellaOps-Tenant` | Tenant identifier for the simulation. | | `X-StellaOps-Run-Id` | Stable run id (use as idempotency key). | | `X-StellaOps-Webhook-Key` | Optional API key as configured. | ### 2.3 Payload See `samples/api/scheduler/policy-simulation-webhook.json` for a canonical example. ```json { "tenantId": "tenant-alpha", "simulation": { /* PolicyRunStatus document */ }, "result": "failed", "observedAt": "2025-11-03T20:05:12Z", "latencySeconds": 14.287, "reason": "policy engine timeout" } ``` - `result`: `succeeded`, `failed`, `cancelled`, `running`, or `queued`. Terminal webhooks are emitted only for the first three. - `latencySeconds`: bounded to four decimal places; derived from `finishedAt - queuedAt` when timestamps exist, else falls back to observer timestamp. - `reason`: surfaced for failures (`error`) and cancellations (`cancellationReason`); omitted otherwise. ### 2.4 Delivery semantics - Best effort with no retry from the worker — Registry should use `X-StellaOps-Run-Id` for idempotency. - Failures emit WARN logs (prefix `Policy run job {JobId}`). - Disabled configuration short-circuits without network calls (debug log only). ## 3. SSE compatibility No changes were required on the streaming endpoint (`GET /api/v1/scheduler/policies/simulations/{id}/stream`); Console continues to receive `completed` events containing the same `PolicyRunStatus` payload that the webhook publishes.