# Orchestrator Golden Signals Observability ## Module Orchestrator ## Status VERIFIED ## Description Built-in golden signal metrics (latency, traffic, errors, saturation) for orchestrator job execution, with timeline event emission and job capsule provenance tracking. ## Implementation Details - **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/` - **Key Classes**: - `OrchestratorGoldenSignals` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs`) - golden signal metrics: latency (p50/p95/p99), traffic (requests/sec), errors (error rate), saturation (queue depth, CPU, memory) - `OrchestratorMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorMetrics.cs`) - OpenTelemetry metrics registration for orchestrator operations - `IncidentModeHooks` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs`) - hooks triggered when golden signals breach thresholds, activating incident mode - `JobAttestationService` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestationService.cs`) - generates attestations for job execution with provenance data - `JobAttestation` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestation.cs`) - attestation model for a completed job - `JobCapsule` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsule.cs`) - capsule containing job execution evidence (inputs, outputs, metrics) - `JobCapsuleGenerator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsuleGenerator.cs`) - generates job capsules from execution data - `JobRedactionGuard` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobRedactionGuard.cs`) - redacts sensitive data from job capsules before attestation - `SnapshotHook` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/SnapshotHook.cs`) - hook capturing execution state snapshots at key points - `ScaleMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs`) - metrics for auto-scaling decisions - `KpiEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/KpiEndpoints.cs`) - REST endpoints for KPI/metrics queries - `HealthEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/HealthEndpoints.cs`) - health check endpoints - **Interfaces**: None (uses concrete implementations) - **Source**: Feature matrix scan ## E2E Test Plan - [ ] Execute a job and verify `OrchestratorGoldenSignals` records latency, traffic, and error metrics - [ ] Verify golden signal latency: execute 10 jobs with varying durations and verify p50/p95/p99 percentiles are computed correctly - [ ] Trigger an error threshold breach and verify `IncidentModeHooks` activates incident mode - [ ] Generate a `JobCapsule` via `JobCapsuleGenerator` and verify it contains job inputs, outputs, and execution metrics - [ ] Verify redaction: include sensitive data in job inputs and verify `JobRedactionGuard` removes it from the capsule - [ ] Generate a `JobAttestation` via `JobAttestationService` and verify it contains the capsule hash and provenance data - [ ] Query KPI metrics via `KpiEndpoints` and verify golden signal data is returned - [ ] Verify `HealthEndpoints` report healthy when golden signals are within thresholds ## Verification - Verified on 2026-02-13 via `run-002`. - Tier 0: Source files confirmed present on disk. - Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed. - Tier 2d: `docs/qa/feature-checks/runs/jobengine/orchestrator-golden-signals-observability/run-002/tier2-integration-check.json`