Files
git.stella-ops.org/docs/guides/agent-operations-quickstart.md
2026-01-17 21:32:08 +02:00

231 lines
4.2 KiB
Markdown

# Agent Operations Quick Start
This guide covers deploying, configuring, and maintaining Stella Ops agents at scale.
## Zero-Touch Bootstrap
Deploy agents with a single command using bootstrap tokens.
### Generate Bootstrap Token
```bash
# Generate token and get install command
stella agent bootstrap --name prod-agent-01 --env production
# Output includes platform-specific one-liners:
# Linux: curl -fsSL https://... | STELLA_TOKEN="..." bash
# Windows: $env:STELLA_TOKEN='...'; iwr -useb https://... | iex
# Docker: docker run -d -e STELLA_TOKEN="..." stellaops/agent:latest
```
### Custom Capabilities
```bash
stella agent bootstrap \
--name prod-agent-01 \
--env production \
--capabilities docker,compose,helm \
--output install-token.txt
```
## Configuration Management
### View Current Configuration
```bash
# Show current config in YAML format
stella agent config
# Show as JSON
stella agent config --format json
```
### Detect Configuration Drift
```bash
# Check for drift between current and desired state
stella agent config --diff
```
### Apply New Configuration
```yaml
# agent-config.yaml
identity:
agentId: agent-abc123
agentName: prod-agent-01
environment: production
connection:
orchestratorUrl: https://orchestrator.example.com
heartbeatInterval: 30s
capabilities:
docker: true
scripts: true
compose: true
resources:
maxConcurrentTasks: 10
workDirectory: /var/lib/stella-agent
security:
certificate:
source: AutoProvision
```
```bash
# Validate without applying
stella agent apply -f agent-config.yaml --dry-run
# Apply configuration
stella agent apply -f agent-config.yaml
```
## Agent Health Diagnostics (Doctor)
### Run Local Diagnostics
```bash
# Run all health checks
stella agent doctor
# Filter by category
stella agent doctor --category security
stella agent doctor --category network
stella agent doctor --category runtime
stella agent doctor --category resources
stella agent doctor --category configuration
```
### Apply Automated Fixes
```bash
# Run diagnostics and apply fixes
stella agent doctor --fix
```
### Output Formats
```bash
# Table output (default)
stella agent doctor
# JSON output for scripting
stella agent doctor --format json
# YAML output
stella agent doctor --format yaml
```
## Certificate Management
### Check Certificate Status
```bash
stella agent cert-status
```
### Renew Certificate
```bash
# Renew if nearing expiry
stella agent renew-cert
# Force renewal
stella agent renew-cert --force
```
## Agent Updates
### Check for Updates
```bash
stella agent update --check
```
### Apply Updates
```bash
# Update to latest
stella agent update
# Update to specific version
stella agent update --version 1.3.0
# Force update outside maintenance window
stella agent update --force
```
### Rollback
```bash
# Rollback to previous version
stella agent rollback
```
## Health Check Categories
| Category | Checks |
|----------|--------|
| Security | Certificate expiry, certificate validity |
| Network | Orchestrator connectivity, DNS resolution |
| Runtime | Docker daemon, task queue depth |
| Resources | Disk space, memory usage, CPU usage |
| Configuration | Configuration drift |
## Troubleshooting
### Common Issues
**Certificate Expired**
```bash
stella agent renew-cert --force
```
**Docker Not Accessible**
```bash
# Check Docker socket
ls -la /var/run/docker.sock
# Add agent to docker group
sudo usermod -aG docker stella-agent
sudo systemctl restart stella-agent
```
**Disk Space Low**
```bash
# Clean up Docker resources
docker system prune -af --volumes
# Check agent work directory
du -sh /var/lib/stella-agent
```
**Connection Issues**
```bash
# Check DNS
nslookup orchestrator.example.com
# Check port
telnet orchestrator.example.com 443
# Check firewall
sudo iptables -L -n | grep 443
```
## Fleet Monitoring
The orchestrator Doctor plugin monitors all agents:
- **Heartbeat Freshness**: Alerts on stale heartbeats
- **Certificate Expiry**: Warns before fleet certificates expire
- **Version Consistency**: Detects version skew across agents
- **Capacity**: Monitors task queue and agent load
- **Failed Task Rate**: Alerts on high failure rates
Access via:
```bash
stella doctor run --plugin agent-health
```