# Agent Cluster Manager with HA Topologies

## Module
ReleaseOrchestrator

## Status
VERIFIED

## Description
Agent clustering with support for multiple HA topologies (ActivePassive, ActiveActive, Sharded), leader election, health monitoring, and automatic failover for release orchestrator agents.

## Implementation Details
- **Modules**: `src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/`
- **Key Classes**:
  - `AgentClusterManager` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/AgentClusterManager.cs`) - manages agent clusters with configurable HA topologies
  - `LeaderElection` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/LeaderElection.cs`) - leader election for ActivePassive topology
  - `FailoverManager` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/FailoverManager.cs`) - automatic failover when leader becomes unhealthy
  - `HealthMonitor` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/HealthMonitor.cs`) - monitors cluster member health
  - `StateSync` (`src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/StateSync.cs`) - state synchronization between cluster members
- **Source**: SPRINT_20260117_034

## E2E Test Plan
- [ ] Configure a 3-node ActivePassive cluster and verify leader election produces a single leader
- [ ] Verify failover: stop the leader node and confirm a new leader is elected within the timeout
- [ ] Verify ActiveActive topology: configure two active nodes and confirm both accept tasks
- [ ] Verify health monitoring: unhealthy node is detected and removed from the active set
- [ ] Verify state synchronization: cluster state converges after a node rejoins


## Verification
- **Verified**: 2026-02-13T21:00:00Z
- **Method**: Tier 2d integration tests
- **Result**: PASS