add release orchestrator docs and sprints gaps fills

This commit is contained in:
2026-01-11 01:05:17 +02:00
parent d58c093887
commit a62974a8c2
37 changed files with 6061 additions and 0 deletions

View File

@@ -0,0 +1,274 @@
# Agent APIs
> API endpoints for agent registration, lifecycle management, and task coordination.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Agent Security](../security/agent-security.md)
## Overview
The Agent API provides endpoints for registering deployment agents, managing their lifecycle, and coordinating task execution. Agents use mTLS for secure communication after initial registration.
---
## Registration Endpoints
### Register Agent
**Endpoint:** `POST /api/v1/agents/register`
Registers a new agent with the orchestrator. Requires a one-time registration token.
**Headers:**
```
X-Agent-Token: {registration-token}
```
**Request:**
```json
{
"name": "agent-prod-01",
"version": "1.0.0",
"capabilities": ["docker", "compose"],
"labels": {
"datacenter": "us-east-1",
"role": "deployment"
}
}
```
**Response:** `201 Created`
```json
{
"agentId": "uuid",
"token": "jwt-token-for-subsequent-requests",
"config": {
"heartbeatInterval": 30,
"taskPollInterval": 5,
"logLevel": "info"
},
"certificate": {
"cert": "-----BEGIN CERTIFICATE-----...",
"key": "-----BEGIN PRIVATE KEY-----...",
"ca": "-----BEGIN CERTIFICATE-----...",
"expiresAt": "2026-01-11T14:23:45Z"
}
}
```
**Notes:**
- Registration token is single-use and expires after 24 hours
- After registration, agent must use mTLS for all subsequent requests
- Certificate is short-lived (24h) and must be renewed via heartbeat
---
## Lifecycle Endpoints
### List Agents
**Endpoint:** `GET /api/v1/agents`
**Query Parameters:**
- `status` (string): Filter by status (`online`, `offline`, `degraded`)
- `capability` (string): Filter by capability (`docker`, `compose`, `ssh`, `winrm`, `ecs`, `nomad`)
**Response:** `200 OK`
```json
[
{
"id": "uuid",
"name": "agent-prod-01",
"version": "1.0.0",
"status": "online",
"capabilities": ["docker", "compose"],
"lastHeartbeat": "2026-01-10T14:23:45Z",
"resourceUsage": {
"cpu": 15.5,
"memory": 45.2
}
}
]
```
### Get Agent
**Endpoint:** `GET /api/v1/agents/{id}`
**Response:** `200 OK` - Full agent details including assigned targets
### Update Agent
**Endpoint:** `PUT /api/v1/agents/{id}`
**Request:**
```json
{
"labels": {
"datacenter": "us-west-2"
},
"capabilities": ["docker", "compose", "ssh"]
}
```
**Response:** `200 OK` - Updated agent
### Delete Agent
**Endpoint:** `DELETE /api/v1/agents/{id}`
Revokes agent credentials and removes registration.
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Heartbeat Endpoints
### Send Heartbeat
**Endpoint:** `POST /api/v1/agents/{id}/heartbeat`
Agents must send heartbeats at the configured interval to maintain online status and receive pending tasks.
**Request:**
```json
{
"status": "healthy",
"resourceUsage": {
"cpu": 15.5,
"memory": 45.2,
"disk": 60.0
},
"capabilities": ["docker", "compose"],
"runningTasks": 2
}
```
**Response:** `200 OK`
```json
{
"tasks": [
{
"taskId": "uuid",
"taskType": "docker.pull",
"payload": {
"image": "myapp",
"tag": "v2.3.1",
"digest": "sha256:abc123..."
},
"credentials": {
"registry.username": "user",
"registry.password": "token"
},
"timeout": 300
}
],
"certificateRenewal": {
"cert": "-----BEGIN CERTIFICATE-----...",
"expiresAt": "2026-01-11T14:23:45Z"
}
}
```
**Notes:**
- Certificate renewal is included when current certificate is within 1 hour of expiration
- Tasks array contains pending work for the agent
- Missing heartbeats for 3 intervals marks agent as `offline`
---
## Task Endpoints
### Complete Task
**Endpoint:** `POST /api/v1/agents/{id}/tasks/{taskId}/complete`
Reports task completion status back to the orchestrator.
**Request:**
```json
{
"success": true,
"result": {
"imageId": "sha256:abc123...",
"containerId": "container-uuid"
},
"logs": [
{ "timestamp": "2026-01-10T14:23:45Z", "level": "info", "message": "Pulling image..." },
{ "timestamp": "2026-01-10T14:23:50Z", "level": "info", "message": "Image pulled successfully" }
]
}
```
**Response:** `200 OK`
```json
{ "acknowledged": true }
```
### Get Pending Tasks
**Endpoint:** `GET /api/v1/agents/{id}/tasks`
Alternative to heartbeat for polling pending tasks.
**Response:** `200 OK`
```json
{
"tasks": [
{
"taskId": "uuid",
"taskType": "docker.run",
"priority": 10,
"createdAt": "2026-01-10T14:20:00Z"
}
]
}
```
---
## WebSocket Endpoints
### Task Stream
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
Real-time task assignment stream for agents.
**Messages (Server to Agent):**
```json
{ "type": "task_assigned", "task": { "taskId": "uuid", "taskType": "docker.pull", ... } }
{ "type": "task_cancelled", "taskId": "uuid" }
```
**Messages (Agent to Server):**
```json
{ "type": "task_progress", "taskId": "uuid", "progress": 50, "message": "Pulling layer 3/5" }
{ "type": "task_log", "taskId": "uuid", "level": "info", "message": "..." }
```
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `401` | Invalid or expired registration token |
| `403` | Agent not authorized for this operation |
| `404` | Agent not found |
| `409` | Agent name already registered |
| `503` | Agent offline or unreachable |
---
## See Also
- [Environments API](environments.md)
- [Agents Module](../modules/agents.md)
- [Agent Security](../security/agent-security.md)
- [WebSocket APIs](websockets.md)