--- checkId: check.agent.version.consistency plugin: stellaops.doctor.agent severity: warn tags: [agent, version, maintenance] --- # Agent Version Consistency ## What It Checks Groups all non-revoked, non-inactive agents by their reported `Version` field and evaluates version skew: 1. **Single version** across all agents: **Pass** -- all agents are consistent. 2. **Two versions** with skew affecting less than half the fleet: **Pass** (minor skew acceptable). 3. **Significant skew** (more than 2 distinct versions, or outdated agents exceed half the fleet): **Warn** with evidence listing the version distribution and up to 10 outdated agent names. 4. **No active agents**: **Skip**. The "majority version" is the version running on the most agents. All other versions are considered outdated. Evidence collected: `MajorityVersion`, `VersionDistribution` (e.g., "1.5.0: 8, 1.4.2: 2"), `OutdatedAgents` (list of names with their versions). ## Why It Matters Version skew across the agent fleet can cause subtle compatibility issues: newer agents may support task types that older agents reject, protocol changes may cause heartbeat or dispatch failures, and mixed versions make incident triage harder because behavior differs across agents. Keeping the fleet consistent reduces operational surprises. ## Common Causes - Auto-update is disabled on some agents - Some agents failed to update (download failure, permission issue, disk full) - Phased rollout in progress (expected, temporary skew) - Agents on isolated networks that cannot reach the update server ## How to Fix ### Docker Compose ```bash # Check agent image versions docker compose -f devops/compose/docker-compose.stella-ops.yml ps agent --format json | \ jq '.[] | {name: .Name, image: .Image}' # Pull latest image and recreate docker compose -f devops/compose/docker-compose.stella-ops.yml pull agent docker compose -f devops/compose/docker-compose.stella-ops.yml up -d agent ``` ### Bare Metal / systemd ```bash # Update outdated agents to target version stella agent update --version --agent-id # Enable auto-update stella agent config --agent-id --set auto_update.enabled=true # Batch update all agents stella agent update --version --all ``` ### Kubernetes / Helm ```bash # Check running image versions across pods kubectl get pods -l app.kubernetes.io/component=agent -n stellaops \ -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}' # Update image tag in Helm values and rollout helm upgrade stellaops stellaops/stellaops --set agent.image.tag= # Monitor rollout kubectl rollout status deployment/stellaops-agent -n stellaops ``` ## Verification ``` stella doctor run --check check.agent.version.consistency ``` ## Related Checks - `check.agent.heartbeat.freshness` -- version mismatch can cause heartbeat protocol failures - `check.agent.capacity` -- outdated agents may be unable to accept newer task types