release orchestrator pivot, architecture and planning
This commit is contained in:
286
docs/modules/release-orchestrator/security/agent-security.md
Normal file
286
docs/modules/release-orchestrator/security/agent-security.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Agent Security Model
|
||||
|
||||
## Overview
|
||||
|
||||
Agents are trusted components that execute deployment tasks on targets. Their security model ensures:
|
||||
- Strong identity through mTLS certificates
|
||||
- Minimal privilege through scoped task credentials
|
||||
- Audit trail through signed task receipts
|
||||
- Isolation through process sandboxing
|
||||
|
||||
## Agent Registration Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT REGISTRATION FLOW │
|
||||
│ │
|
||||
│ 1. Admin generates registration token (one-time use) │
|
||||
│ POST /api/v1/admin/agent-tokens │
|
||||
│ Response: { token: "reg_xxx", expiresAt: "..." } │
|
||||
│ │
|
||||
│ 2. Agent starts with registration token │
|
||||
│ ./stella-agent --register --token=reg_xxx │
|
||||
│ │
|
||||
│ 3. Agent requests mTLS certificate │
|
||||
│ POST /api/v1/agents/register │
|
||||
│ Headers: X-Registration-Token: reg_xxx │
|
||||
│ Body: { name, version, capabilities, csr } │
|
||||
│ Response: { agentId, certificate, caCertificate } │
|
||||
│ │
|
||||
│ 4. Agent establishes mTLS connection │
|
||||
│ Uses issued certificate for all subsequent requests │
|
||||
│ │
|
||||
│ 5. Agent requests short-lived JWT for task execution │
|
||||
│ POST /api/v1/agents/token (over mTLS) │
|
||||
│ Response: { token, expiresIn: 3600 } │
|
||||
│ │
|
||||
│ 6. Agent refreshes token before expiration │
|
||||
│ Token refresh only over mTLS connection │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## mTLS Communication
|
||||
|
||||
All agent-to-core communication uses mutual TLS:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT COMMUNICATION SECURITY │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ AGENT │ │ STELLA CORE │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ mTLS (mutual TLS) │ │
|
||||
│ │ - Agent cert signed by Stella CA │ │
|
||||
│ │ - Server cert verified by Agent │ │
|
||||
│ │ - TLS 1.3 only │ │
|
||||
│ │ - Perfect forward secrecy │ │
|
||||
│ │◄────────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Encrypted payload │ │
|
||||
│ │ - Task payloads encrypted with │ │
|
||||
│ │ agent-specific key │ │
|
||||
│ │ - Logs encrypted in transit │ │
|
||||
│ │◄────────────────────────────────────────►│ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### TLS Requirements
|
||||
|
||||
| Requirement | Value |
|
||||
|-------------|-------|
|
||||
| Protocol | TLS 1.3 only |
|
||||
| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 |
|
||||
| Key Exchange | ECDHE with P-384 or X25519 |
|
||||
| Certificate Key | RSA 4096-bit or ECDSA P-384 |
|
||||
| Certificate Validity | 90 days (auto-renewed) |
|
||||
|
||||
## Certificate Management
|
||||
|
||||
### Certificate Structure
|
||||
|
||||
```typescript
|
||||
interface AgentCertificate {
|
||||
subject: {
|
||||
CN: string; // Agent name
|
||||
O: string; // "Stella Ops"
|
||||
OU: string; // Tenant ID
|
||||
};
|
||||
serialNumber: string;
|
||||
issuer: string; // Stella CA
|
||||
validFrom: DateTime;
|
||||
validTo: DateTime;
|
||||
extensions: {
|
||||
keyUsage: ["digitalSignature", "keyEncipherment"];
|
||||
extendedKeyUsage: ["clientAuth"];
|
||||
subjectAltName: string[]; // Agent ID as URI
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Certificate Renewal
|
||||
|
||||
Agents automatically renew certificates before expiration:
|
||||
1. Agent detects certificate expiring within 30 days
|
||||
2. Agent generates new CSR with same identity
|
||||
3. Agent submits renewal request over existing mTLS connection
|
||||
4. Authority issues new certificate
|
||||
5. Agent transitions to new certificate seamlessly
|
||||
|
||||
## Secrets Management
|
||||
|
||||
Secrets are NEVER stored in the Stella database. Only vault references are stored.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECRETS FLOW (NEVER STORED IN DB) │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
|
||||
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Task requires secret │ │
|
||||
│ │ │ │ │
|
||||
│ │ Fetch with service │ │ │
|
||||
│ │ account token │ │ │
|
||||
│ │◄─────────────────────── │ │
|
||||
│ │ │ │ │
|
||||
│ │ Return secret │ │ │
|
||||
│ │ (wrapped, short TTL) │ │ │
|
||||
│ │────────────────────────► │ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Embed in task payload │ │
|
||||
│ │ │ (encrypted) │ │
|
||||
│ │ │────────────────────────► │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ Decrypt │
|
||||
│ │ │ │ Use for task │
|
||||
│ │ │ │ Discard │
|
||||
│ │
|
||||
│ Rules: │
|
||||
│ - Secrets NEVER stored in Stella database │
|
||||
│ - Only Vault references stored │
|
||||
│ - Secrets fetched at execution time only │
|
||||
│ - Secrets not logged (masked in logs) │
|
||||
│ - Secrets not persisted in agent memory beyond task scope │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Task Security
|
||||
|
||||
### Task Assignment
|
||||
|
||||
```typescript
|
||||
interface AgentTask {
|
||||
id: UUID;
|
||||
type: TaskType;
|
||||
targetId: UUID;
|
||||
payload: TaskPayload;
|
||||
credentials: EncryptedCredentials; // Encrypted with agent's public key
|
||||
timeout: number;
|
||||
priority: TaskPriority;
|
||||
idempotencyKey: string;
|
||||
assignedAt: DateTime;
|
||||
expiresAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
### Credential Scoping
|
||||
|
||||
Task credentials are:
|
||||
- Scoped to specific target only
|
||||
- Valid only for task duration
|
||||
- Encrypted with agent's public key
|
||||
- Logged when accessed (without values)
|
||||
|
||||
### Task Execution Isolation
|
||||
|
||||
Agents execute tasks with isolation:
|
||||
```typescript
|
||||
interface TaskExecutionContext {
|
||||
// Process isolation
|
||||
workingDirectory: string; // Unique per task
|
||||
processUser: string; // Non-root user
|
||||
networkNamespace: string; // If network isolation enabled
|
||||
|
||||
// Resource limits
|
||||
memoryLimit: number; // Bytes
|
||||
cpuLimit: number; // Millicores
|
||||
diskLimit: number; // Bytes
|
||||
networkEgress: string[]; // Allowed destinations
|
||||
|
||||
// Cleanup
|
||||
cleanupOnComplete: boolean;
|
||||
cleanupTimeout: number;
|
||||
}
|
||||
```
|
||||
|
||||
## Agent Capabilities
|
||||
|
||||
Agents declare capabilities that determine what tasks they can execute:
|
||||
|
||||
```typescript
|
||||
interface AgentCapabilities {
|
||||
docker?: DockerCapability;
|
||||
compose?: ComposeCapability;
|
||||
ssh?: SshCapability;
|
||||
winrm?: WinrmCapability;
|
||||
ecs?: EcsCapability;
|
||||
nomad?: NomadCapability;
|
||||
}
|
||||
|
||||
interface DockerCapability {
|
||||
version: string;
|
||||
apiVersion: string;
|
||||
runtimes: string[];
|
||||
registryAuth: boolean;
|
||||
}
|
||||
|
||||
interface ComposeCapability {
|
||||
version: string;
|
||||
fileFormats: string[];
|
||||
}
|
||||
```
|
||||
|
||||
## Heartbeat Protocol
|
||||
|
||||
```typescript
|
||||
interface AgentHeartbeat {
|
||||
agentId: UUID;
|
||||
timestamp: DateTime;
|
||||
status: "healthy" | "degraded";
|
||||
resourceUsage: {
|
||||
cpuPercent: number;
|
||||
memoryPercent: number;
|
||||
diskPercent: number;
|
||||
networkRxBytes: number;
|
||||
networkTxBytes: number;
|
||||
};
|
||||
activeTaskCount: number;
|
||||
completedTasks: number;
|
||||
failedTasks: number;
|
||||
errors: string[];
|
||||
signature: string; // HMAC of heartbeat data
|
||||
}
|
||||
```
|
||||
|
||||
### Heartbeat Validation
|
||||
|
||||
1. Verify signature matches expected HMAC
|
||||
2. Check timestamp is within acceptable skew (30s)
|
||||
3. Update agent status based on heartbeat content
|
||||
4. Trigger alerts if heartbeat missing for >90s
|
||||
|
||||
## Agent Revocation
|
||||
|
||||
When an agent is compromised or decommissioned:
|
||||
|
||||
1. Certificate added to CRL (Certificate Revocation List)
|
||||
2. All pending tasks for agent cancelled
|
||||
3. Agent removed from target assignments
|
||||
4. Audit event logged
|
||||
5. New agent can be registered with same name (new identity)
|
||||
|
||||
## Security Checklist
|
||||
|
||||
| Control | Implementation |
|
||||
|---------|----------------|
|
||||
| Identity | mTLS certificates signed by internal CA |
|
||||
| Authentication | Certificate-based + short-lived JWT |
|
||||
| Authorization | Task-scoped credentials |
|
||||
| Encryption | TLS 1.3 for transport, envelope encryption for secrets |
|
||||
| Isolation | Process sandboxing, resource limits |
|
||||
| Audit | All task assignments and completions logged |
|
||||
| Revocation | CRL for compromised agents |
|
||||
| Secret handling | Vault integration, no persistence |
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Authentication & Authorization](auth.md)
|
||||
- [Threat Model](threat-model.md)
|
||||
Reference in New Issue
Block a user