Files
git.stella-ops.org/docs/features/unchecked/releaseorchestrator/agent-cluster-manager-with-ha-topologies.md

1.7 KiB

Agent Cluster Manager with HA Topologies

Module

ReleaseOrchestrator

Status

IMPLEMENTED

Description

Agent clustering with support for multiple HA topologies (ActivePassive, ActiveActive, Sharded), leader election, health monitoring, and automatic failover for release orchestrator agents.

Implementation Details

  • Modules: src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/
  • Key Classes:
    • AgentClusterManager (src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/AgentClusterManager.cs) - manages agent clusters with configurable HA topologies
    • LeaderElection (src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/LeaderElection.cs) - leader election for ActivePassive topology
    • FailoverManager (src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/FailoverManager.cs) - automatic failover when leader becomes unhealthy
    • HealthMonitor (src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/HealthMonitor.cs) - monitors cluster member health
    • StateSync (src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/StateSync.cs) - state synchronization between cluster members
  • Source: SPRINT_20260117_034

E2E Test Plan

  • Configure a 3-node ActivePassive cluster and verify leader election produces a single leader
  • Verify failover: stop the leader node and confirm a new leader is elected within the timeout
  • Verify ActiveActive topology: configure two active nodes and confirm both accept tasks
  • Verify health monitoring: unhealthy node is detected and removed from the active set
  • Verify state synchronization: cluster state converges after a node rejoins