Files
git.stella-ops.org/docs/features/checked/jobengine/jobengine-golden-signals-observability.md

3.9 KiB

Orchestrator Golden Signals Observability

Module

Orchestrator

Status

VERIFIED

Description

Built-in golden signal metrics (latency, traffic, errors, saturation) for orchestrator job execution, with timeline event emission and job capsule provenance tracking.

Implementation Details

  • Modules: src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/, src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/, src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/
  • Key Classes:
    • OrchestratorGoldenSignals (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs) - golden signal metrics: latency (p50/p95/p99), traffic (requests/sec), errors (error rate), saturation (queue depth, CPU, memory)
    • OrchestratorMetrics (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorMetrics.cs) - OpenTelemetry metrics registration for orchestrator operations
    • IncidentModeHooks (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs) - hooks triggered when golden signals breach thresholds, activating incident mode
    • JobAttestationService (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestationService.cs) - generates attestations for job execution with provenance data
    • JobAttestation (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestation.cs) - attestation model for a completed job
    • JobCapsule (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsule.cs) - capsule containing job execution evidence (inputs, outputs, metrics)
    • JobCapsuleGenerator (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsuleGenerator.cs) - generates job capsules from execution data
    • JobRedactionGuard (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobRedactionGuard.cs) - redacts sensitive data from job capsules before attestation
    • SnapshotHook (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/SnapshotHook.cs) - hook capturing execution state snapshots at key points
    • ScaleMetrics (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs) - metrics for auto-scaling decisions
    • KpiEndpoints (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/KpiEndpoints.cs) - REST endpoints for KPI/metrics queries
    • HealthEndpoints (src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/HealthEndpoints.cs) - health check endpoints
  • Interfaces: None (uses concrete implementations)
  • Source: Feature matrix scan

E2E Test Plan

  • Execute a job and verify OrchestratorGoldenSignals records latency, traffic, and error metrics
  • Verify golden signal latency: execute 10 jobs with varying durations and verify p50/p95/p99 percentiles are computed correctly
  • Trigger an error threshold breach and verify IncidentModeHooks activates incident mode
  • Generate a JobCapsule via JobCapsuleGenerator and verify it contains job inputs, outputs, and execution metrics
  • Verify redaction: include sensitive data in job inputs and verify JobRedactionGuard removes it from the capsule
  • Generate a JobAttestation via JobAttestationService and verify it contains the capsule hash and provenance data
  • Query KPI metrics via KpiEndpoints and verify golden signal data is returned
  • Verify HealthEndpoints report healthy when golden signals are within thresholds

Verification

  • Verified on 2026-02-13 via run-002.
  • Tier 0: Source files confirmed present on disk.
  • Tier 1: dotnet build passed (0 errors); 1292/1292 tests passed.
  • Tier 2d: docs/qa/feature-checks/runs/jobengine/orchestrator-golden-signals-observability/run-002/tier2-integration-check.json