45 lines
3.9 KiB
Markdown
45 lines
3.9 KiB
Markdown
# Orchestrator Golden Signals Observability
|
|
|
|
## Module
|
|
Orchestrator
|
|
|
|
## Status
|
|
VERIFIED
|
|
|
|
## Description
|
|
Built-in golden signal metrics (latency, traffic, errors, saturation) for orchestrator job execution, with timeline event emission and job capsule provenance tracking.
|
|
|
|
## Implementation Details
|
|
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/`
|
|
- **Key Classes**:
|
|
- `OrchestratorGoldenSignals` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs`) - golden signal metrics: latency (p50/p95/p99), traffic (requests/sec), errors (error rate), saturation (queue depth, CPU, memory)
|
|
- `OrchestratorMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorMetrics.cs`) - OpenTelemetry metrics registration for orchestrator operations
|
|
- `IncidentModeHooks` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs`) - hooks triggered when golden signals breach thresholds, activating incident mode
|
|
- `JobAttestationService` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestationService.cs`) - generates attestations for job execution with provenance data
|
|
- `JobAttestation` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestation.cs`) - attestation model for a completed job
|
|
- `JobCapsule` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsule.cs`) - capsule containing job execution evidence (inputs, outputs, metrics)
|
|
- `JobCapsuleGenerator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsuleGenerator.cs`) - generates job capsules from execution data
|
|
- `JobRedactionGuard` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobRedactionGuard.cs`) - redacts sensitive data from job capsules before attestation
|
|
- `SnapshotHook` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/SnapshotHook.cs`) - hook capturing execution state snapshots at key points
|
|
- `ScaleMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs`) - metrics for auto-scaling decisions
|
|
- `KpiEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/KpiEndpoints.cs`) - REST endpoints for KPI/metrics queries
|
|
- `HealthEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/HealthEndpoints.cs`) - health check endpoints
|
|
- **Interfaces**: None (uses concrete implementations)
|
|
- **Source**: Feature matrix scan
|
|
|
|
## E2E Test Plan
|
|
- [ ] Execute a job and verify `OrchestratorGoldenSignals` records latency, traffic, and error metrics
|
|
- [ ] Verify golden signal latency: execute 10 jobs with varying durations and verify p50/p95/p99 percentiles are computed correctly
|
|
- [ ] Trigger an error threshold breach and verify `IncidentModeHooks` activates incident mode
|
|
- [ ] Generate a `JobCapsule` via `JobCapsuleGenerator` and verify it contains job inputs, outputs, and execution metrics
|
|
- [ ] Verify redaction: include sensitive data in job inputs and verify `JobRedactionGuard` removes it from the capsule
|
|
- [ ] Generate a `JobAttestation` via `JobAttestationService` and verify it contains the capsule hash and provenance data
|
|
- [ ] Query KPI metrics via `KpiEndpoints` and verify golden signal data is returned
|
|
- [ ] Verify `HealthEndpoints` report healthy when golden signals are within thresholds
|
|
|
|
## Verification
|
|
- Verified on 2026-02-13 via `run-002`.
|
|
- Tier 0: Source files confirmed present on disk.
|
|
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
|
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/orchestrator-golden-signals-observability/run-002/tier2-integration-check.json`
|