1.7 KiB
1.7 KiB
Agent Cluster Manager with HA Topologies
Module
ReleaseOrchestrator
Status
IMPLEMENTED
Description
Agent clustering with support for multiple HA topologies (ActivePassive, ActiveActive, Sharded), leader election, health monitoring, and automatic failover for release orchestrator agents.
Implementation Details
- Modules:
src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/ - Key Classes:
AgentClusterManager(src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/AgentClusterManager.cs) - manages agent clusters with configurable HA topologiesLeaderElection(src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/LeaderElection.cs) - leader election for ActivePassive topologyFailoverManager(src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/FailoverManager.cs) - automatic failover when leader becomes unhealthyHealthMonitor(src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/HealthMonitor.cs) - monitors cluster member healthStateSync(src/ReleaseOrchestrator/__Agents/StellaOps.Agent.Core/Resilience/StateSync.cs) - state synchronization between cluster members
- Source: SPRINT_20260117_034
E2E Test Plan
- Configure a 3-node ActivePassive cluster and verify leader election produces a single leader
- Verify failover: stop the leader node and confirm a new leader is elected within the timeout
- Verify ActiveActive topology: configure two active nodes and confirm both accept tasks
- Verify health monitoring: unhealthy node is detected and removed from the active set
- Verify state synchronization: cluster state converges after a node rejoins