release orchestrator pivot, architecture and planning

2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions
--- a/docs/modules/release-orchestrator/security/agent-security.md
+++ b/docs/modules/release-orchestrator/security/agent-security.md
@@ -0,0 +1,286 @@
+# Agent Security Model
+
+## Overview
+
+Agents are trusted components that execute deployment tasks on targets. Their security model ensures:
+- Strong identity through mTLS certificates
+- Minimal privilege through scoped task credentials
+- Audit trail through signed task receipts
+- Isolation through process sandboxing
+
+## Agent Registration Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    AGENT REGISTRATION FLOW                                  │
+│                                                                             │
+│  1. Admin generates registration token (one-time use)                       │
+│     POST /api/v1/admin/agent-tokens                                        │
+│     Response: { token: "reg_xxx", expiresAt: "..." }                       │
+│                                                                             │
+│  2. Agent starts with registration token                                    │
+│     ./stella-agent --register --token=reg_xxx                              │
+│                                                                             │
+│  3. Agent requests mTLS certificate                                         │
+│     POST /api/v1/agents/register                                           │
+│     Headers: X-Registration-Token: reg_xxx                                 │
+│     Body: { name, version, capabilities, csr }                             │
+│     Response: { agentId, certificate, caCertificate }                      │
+│                                                                             │
+│  4. Agent establishes mTLS connection                                       │
+│     Uses issued certificate for all subsequent requests                    │
+│                                                                             │
+│  5. Agent requests short-lived JWT for task execution                       │
+│     POST /api/v1/agents/token (over mTLS)                                  │
+│     Response: { token, expiresIn: 3600 }                                   │
+│                                                                             │
+│  6. Agent refreshes token before expiration                                 │
+│     Token refresh only over mTLS connection                                │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## mTLS Communication
+
+All agent-to-core communication uses mutual TLS:
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    AGENT COMMUNICATION SECURITY                             │
+│                                                                             │
+│  ┌──────────────┐                          ┌──────────────┐                │
+│  │    AGENT     │                          │  STELLA CORE │                │
+│  └──────┬───────┘                          └──────┬───────┘                │
+│         │                                         │                         │
+│         │  mTLS (mutual TLS)                      │                         │
+│         │  - Agent cert signed by Stella CA       │                         │
+│         │  - Server cert verified by Agent        │                         │
+│         │  - TLS 1.3 only                         │                         │
+│         │  - Perfect forward secrecy              │                         │
+│         │◄────────────────────────────────────────►│                         │
+│         │                                         │                         │
+│         │  Encrypted payload                      │                         │
+│         │  - Task payloads encrypted with         │                         │
+│         │    agent-specific key                   │                         │
+│         │  - Logs encrypted in transit            │                         │
+│         │◄────────────────────────────────────────►│                         │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### TLS Requirements
+
+| Requirement | Value |
+|-------------|-------|
+| Protocol | TLS 1.3 only |
+| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 |
+| Key Exchange | ECDHE with P-384 or X25519 |
+| Certificate Key | RSA 4096-bit or ECDSA P-384 |
+| Certificate Validity | 90 days (auto-renewed) |
+
+## Certificate Management
+
+### Certificate Structure
+
+```typescript
+interface AgentCertificate {
+  subject: {
+    CN: string;        // Agent name
+    O: string;         // "Stella Ops"
+    OU: string;        // Tenant ID
+  };
+  serialNumber: string;
+  issuer: string;      // Stella CA
+  validFrom: DateTime;
+  validTo: DateTime;
+  extensions: {
+    keyUsage: ["digitalSignature", "keyEncipherment"];
+    extendedKeyUsage: ["clientAuth"];
+    subjectAltName: string[];  // Agent ID as URI
+  };
+}
+```
+
+### Certificate Renewal
+
+Agents automatically renew certificates before expiration:
+1. Agent detects certificate expiring within 30 days
+2. Agent generates new CSR with same identity
+3. Agent submits renewal request over existing mTLS connection
+4. Authority issues new certificate
+5. Agent transitions to new certificate seamlessly
+
+## Secrets Management
+
+Secrets are NEVER stored in the Stella database. Only vault references are stored.
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    SECRETS FLOW (NEVER STORED IN DB)                        │
+│                                                                             │
+│  ┌──────────────┐        ┌──────────────┐        ┌──────────────┐          │
+│  │    VAULT     │        │  STELLA CORE │        │    AGENT     │          │
+│  │  (Source)    │        │  (Broker)    │        │  (Consumer)  │          │
+│  └──────┬───────┘        └──────┬───────┘        └──────┬───────┘          │
+│         │                       │                       │                   │
+│         │                       │ Task requires secret  │                   │
+│         │                       │                       │                   │
+│         │ Fetch with service   │                       │                   │
+│         │ account token        │                       │                   │
+│         │◄───────────────────────                       │                   │
+│         │                       │                       │                   │
+│         │ Return secret        │                       │                   │
+│         │ (wrapped, short TTL) │                       │                   │
+│         │────────────────────────►                       │                   │
+│         │                       │                       │                   │
+│         │                       │ Embed in task payload │                   │
+│         │                       │ (encrypted)          │                   │
+│         │                       │────────────────────────►                   │
+│         │                       │                       │                   │
+│         │                       │                       │ Decrypt           │
+│         │                       │                       │ Use for task      │
+│         │                       │                       │ Discard           │
+│                                                                             │
+│  Rules:                                                                     │
+│  - Secrets NEVER stored in Stella database                                 │
+│  - Only Vault references stored                                            │
+│  - Secrets fetched at execution time only                                  │
+│  - Secrets not logged (masked in logs)                                     │
+│  - Secrets not persisted in agent memory beyond task scope                 │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Task Security
+
+### Task Assignment
+
+```typescript
+interface AgentTask {
+  id: UUID;
+  type: TaskType;
+  targetId: UUID;
+  payload: TaskPayload;
+  credentials: EncryptedCredentials;  // Encrypted with agent's public key
+  timeout: number;
+  priority: TaskPriority;
+  idempotencyKey: string;
+  assignedAt: DateTime;
+  expiresAt: DateTime;
+}
+```
+
+### Credential Scoping
+
+Task credentials are:
+- Scoped to specific target only
+- Valid only for task duration
+- Encrypted with agent's public key
+- Logged when accessed (without values)
+
+### Task Execution Isolation
+
+Agents execute tasks with isolation:
+```typescript
+interface TaskExecutionContext {
+  // Process isolation
+  workingDirectory: string;      // Unique per task
+  processUser: string;           // Non-root user
+  networkNamespace: string;      // If network isolation enabled
+
+  // Resource limits
+  memoryLimit: number;           // Bytes
+  cpuLimit: number;              // Millicores
+  diskLimit: number;             // Bytes
+  networkEgress: string[];       // Allowed destinations
+
+  // Cleanup
+  cleanupOnComplete: boolean;
+  cleanupTimeout: number;
+}
+```
+
+## Agent Capabilities
+
+Agents declare capabilities that determine what tasks they can execute:
+
+```typescript
+interface AgentCapabilities {
+  docker?: DockerCapability;
+  compose?: ComposeCapability;
+  ssh?: SshCapability;
+  winrm?: WinrmCapability;
+  ecs?: EcsCapability;
+  nomad?: NomadCapability;
+}
+
+interface DockerCapability {
+  version: string;
+  apiVersion: string;
+  runtimes: string[];
+  registryAuth: boolean;
+}
+
+interface ComposeCapability {
+  version: string;
+  fileFormats: string[];
+}
+```
+
+## Heartbeat Protocol
+
+```typescript
+interface AgentHeartbeat {
+  agentId: UUID;
+  timestamp: DateTime;
+  status: "healthy" | "degraded";
+  resourceUsage: {
+    cpuPercent: number;
+    memoryPercent: number;
+    diskPercent: number;
+    networkRxBytes: number;
+    networkTxBytes: number;
+  };
+  activeTaskCount: number;
+  completedTasks: number;
+  failedTasks: number;
+  errors: string[];
+  signature: string;  // HMAC of heartbeat data
+}
+```
+
+### Heartbeat Validation
+
+1. Verify signature matches expected HMAC
+2. Check timestamp is within acceptable skew (30s)
+3. Update agent status based on heartbeat content
+4. Trigger alerts if heartbeat missing for >90s
+
+## Agent Revocation
+
+When an agent is compromised or decommissioned:
+
+1. Certificate added to CRL (Certificate Revocation List)
+2. All pending tasks for agent cancelled
+3. Agent removed from target assignments
+4. Audit event logged
+5. New agent can be registered with same name (new identity)
+
+## Security Checklist
+
+| Control | Implementation |
+|---------|----------------|
+| Identity | mTLS certificates signed by internal CA |
+| Authentication | Certificate-based + short-lived JWT |
+| Authorization | Task-scoped credentials |
+| Encryption | TLS 1.3 for transport, envelope encryption for secrets |
+| Isolation | Process sandboxing, resource limits |
+| Audit | All task assignments and completions logged |
+| Revocation | CRL for compromised agents |
+| Secret handling | Vault integration, no persistence |
+
+## References
+
+- [Security Overview](overview.md)
+- [Authentication & Authorization](auth.md)
+- [Threat Model](threat-model.md)