3.9 KiB
3.9 KiB
Orchestrator Golden Signals Observability
Module
Orchestrator
Status
VERIFIED
Description
Built-in golden signal metrics (latency, traffic, errors, saturation) for orchestrator job execution, with timeline event emission and job capsule provenance tracking.
Implementation Details
- Modules:
src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/,src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/,src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ - Key Classes:
OrchestratorGoldenSignals(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs) - golden signal metrics: latency (p50/p95/p99), traffic (requests/sec), errors (error rate), saturation (queue depth, CPU, memory)OrchestratorMetrics(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorMetrics.cs) - OpenTelemetry metrics registration for orchestrator operationsIncidentModeHooks(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs) - hooks triggered when golden signals breach thresholds, activating incident modeJobAttestationService(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestationService.cs) - generates attestations for job execution with provenance dataJobAttestation(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestation.cs) - attestation model for a completed jobJobCapsule(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsule.cs) - capsule containing job execution evidence (inputs, outputs, metrics)JobCapsuleGenerator(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsuleGenerator.cs) - generates job capsules from execution dataJobRedactionGuard(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobRedactionGuard.cs) - redacts sensitive data from job capsules before attestationSnapshotHook(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/SnapshotHook.cs) - hook capturing execution state snapshots at key pointsScaleMetrics(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs) - metrics for auto-scaling decisionsKpiEndpoints(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/KpiEndpoints.cs) - REST endpoints for KPI/metrics queriesHealthEndpoints(src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/HealthEndpoints.cs) - health check endpoints
- Interfaces: None (uses concrete implementations)
- Source: Feature matrix scan
E2E Test Plan
- Execute a job and verify
OrchestratorGoldenSignalsrecords latency, traffic, and error metrics - Verify golden signal latency: execute 10 jobs with varying durations and verify p50/p95/p99 percentiles are computed correctly
- Trigger an error threshold breach and verify
IncidentModeHooksactivates incident mode - Generate a
JobCapsuleviaJobCapsuleGeneratorand verify it contains job inputs, outputs, and execution metrics - Verify redaction: include sensitive data in job inputs and verify
JobRedactionGuardremoves it from the capsule - Generate a
JobAttestationviaJobAttestationServiceand verify it contains the capsule hash and provenance data - Query KPI metrics via
KpiEndpointsand verify golden signal data is returned - Verify
HealthEndpointsreport healthy when golden signals are within thresholds
Verification
- Verified on 2026-02-13 via
run-002. - Tier 0: Source files confirmed present on disk.
- Tier 1:
dotnet buildpassed (0 errors); 1292/1292 tests passed. - Tier 2d:
docs/qa/feature-checks/runs/jobengine/orchestrator-golden-signals-observability/run-002/tier2-integration-check.json