# Agent Cluster Manager with HA Topologies ## Module ReleaseOrchestrator ## Status VERIFIED ## Description Agent clustering with support for multiple HA topologies (ActivePassive, ActiveActive, Sharded), leader election, health monitoring, and automatic failover for release orchestrator agents. ## Implementation Details - **Modules**: `src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/` - **Key Classes**: - `AgentClusterManager` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/AgentClusterManager.cs`) - manages agent clusters with configurable HA topologies - `LeaderElection` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/LeaderElection.cs`) - leader election for ActivePassive topology - `FailoverManager` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/FailoverManager.cs`) - automatic failover when leader becomes unhealthy - `HealthMonitor` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/HealthMonitor.cs`) - monitors cluster member health - `StateSync` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/StateSync.cs`) - state synchronization between cluster members - **Source**: SPRINT_20260117_034 ## E2E Test Plan - [ ] Configure a 3-node ActivePassive cluster and verify leader election produces a single leader - [ ] Verify failover: stop the leader node and confirm a new leader is elected within the timeout - [ ] Verify ActiveActive topology: configure two active nodes and confirm both accept tasks - [ ] Verify health monitoring: unhealthy node is detected and removed from the active set - [ ] Verify state synchronization: cluster state converges after a node rejoins ## Verification - **Verified**: 2026-02-13T21:00:00Z - **Method**: Tier 2d integration tests - **Result**: PASS