Files
git.stella-ops.org/docs/features/unchecked/releaseorchestrator/intelligent-rollback-system.md

2.4 KiB

Intelligent Rollback System (Predictive + Metric-Driven)

Module

ReleaseOrchestrator

Status

IMPLEMENTED

Description

Predictive rollback engine that forecasts deployment health trajectory using metrics from Prometheus/Datadog/CloudWatch, detects anomalies (Z-score, isolation forest), plans partial component-level rollbacks, and makes automated rollback decisions based on health analysis with baseline comparison.

Implementation Details

  • Modules: src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.Deployment/Rollback/, src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.Deployment/Rollback/Intelligence/, src/ReleaseOrchestrator/__Apps/StellaOps.ReleaseOrchestrator.WebApi/Controllers/
  • Key Classes:
    • PredictiveEngine (src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.Deployment/Rollback/PredictiveEngine.cs) - forecasts deployment health trajectory from metric streams
    • RollbackDecider (src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.Deployment/Rollback/Intelligence/RollbackDecider.cs) - automated rollback decision-making based on health analysis
    • AnomalyDetector (src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.Deployment/Rollback/Intelligence/AnomalyDetector.cs) - detects anomalies using Z-score and isolation forest algorithms
    • BaselineManager (src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.Deployment/Rollback/Intelligence/BaselineManager.cs) - manages metric baselines for comparison
    • RollbackIntelligenceController (src/ReleaseOrchestrator/__Apps/StellaOps.ReleaseOrchestrator.WebApi/Controllers/RollbackIntelligenceController.cs) - REST API for rollback intelligence operations
  • Interfaces: IPredictiveEngine
  • Source: SPRINT_20260117_033

E2E Test Plan

  • Submit deployment metrics to PredictiveEngine and verify health trajectory forecast output
  • Establish a baseline via BaselineManager and verify it stores baseline metric profiles
  • Inject anomalous metrics and verify AnomalyDetector detects them with Z-score/isolation forest
  • Verify RollbackDecider triggers automatic rollback when anomaly thresholds are exceeded
  • Verify partial component-level rollback: only affected components are rolled back
  • Call the RollbackIntelligenceController API and verify rollback recommendations are returned