- Implemented comprehensive unit tests for VexCandidateEmitter to validate candidate emission logic based on various scenarios including absent and present APIs, confidence thresholds, and rate limiting. - Added integration tests for SmartDiff PostgreSQL repositories, covering snapshot storage and retrieval, candidate storage, and material risk change handling. - Ensured tests validate correct behavior for storing, retrieving, and querying snapshots and candidates, including edge cases and expected outcomes.
3.2 KiB
3.2 KiB
SCHED-CONSOLE-27-002 · Policy Simulation Telemetry & Webhooks
Owners: Scheduler WebService Guild, Observability Guild
Scope: Policy simulation metrics endpoint and completion webhooks feeding Registry/Console integrations.
1. Metrics endpoint refresher
GET /api/v1/scheduler/policies/simulations/metrics(scope:policy:simulate)- Returns queue depth grouped by status plus latency percentiles derived from the most recent sample window (default 200 terminal runs).
- Surface area is unchanged from the implementation in Sprint 27 week 1; consumers should continue to rely on the contract in
samples/api/scheduler/policy-simulation-metrics.json. - When backing storage is not PostgreSQL the endpoint responds
501 Not Implemented.
2. Completion webhooks
Scheduler Worker now emits policy simulation webhooks whenever a simulation reaches a terminal state (succeeded, failed, cancelled). Payloads are aligned with the SSE completed event shape and include idempotency headers so downstream systems can safely de-duplicate.
2.1 Configuration
// scheduler-worker.appsettings.json
{
"Scheduler": {
"Worker": {
"Policy": {
"Webhook": {
"Enabled": true,
"Endpoint": "https://registry.internal/hooks/policy-simulation",
"ApiKeyHeader": "X-StellaOps-Webhook-Key",
"ApiKey": "replace-me",
"TimeoutSeconds": 10
}
}
}
}
}
Enabled: feature flag; disabled by default to preserve air-gap behaviour.Endpoint: absolute HTTPS endpoint; requests usePOST.ApiKeyHeader/ApiKey: optional bearer for Registry verification.TimeoutSeconds: per-request timeout (defaults to 10s).
2.2 Headers
| Header | Purpose |
|---|---|
X-StellaOps-Tenant |
Tenant identifier for the simulation. |
X-StellaOps-Run-Id |
Stable run id (use as idempotency key). |
X-StellaOps-Webhook-Key |
Optional API key as configured. |
2.3 Payload
See samples/api/scheduler/policy-simulation-webhook.json for a canonical example.
{
"tenantId": "tenant-alpha",
"simulation": { /* PolicyRunStatus document */ },
"result": "failed",
"observedAt": "2025-11-03T20:05:12Z",
"latencySeconds": 14.287,
"reason": "policy engine timeout"
}
result:succeeded,failed,cancelled,running, orqueued. Terminal webhooks are emitted only for the first three.latencySeconds: bounded to four decimal places; derived fromfinishedAt - queuedAtwhen timestamps exist, else falls back to observer timestamp.reason: surfaced for failures (error) and cancellations (cancellationReason); omitted otherwise.
2.4 Delivery semantics
- Best effort with no retry from the worker — Registry should use
X-StellaOps-Run-Idfor idempotency. - Failures emit WARN logs (prefix
Policy run job {JobId}). - Disabled configuration short-circuits without network calls (debug log only).
3. SSE compatibility
No changes were required on the streaming endpoint (GET /api/v1/scheduler/policies/simulations/{id}/stream); Console continues to receive completed events containing the same PolicyRunStatus payload that the webhook publishes.