add release orchestrator docs and sprints gaps fills

This commit is contained in:
2026-01-11 01:05:17 +02:00
parent d58c093887
commit a62974a8c2
37 changed files with 6061 additions and 0 deletions

View File

@@ -0,0 +1,266 @@
# A/B Release Models
> Two models for A/B releases: target-group based and router-based traffic splitting.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Traffic Router](routers.md)
**Sprint:** [110_001 A/B Release Manager](../../../../implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md)
## Overview
Stella Ops supports two distinct models for A/B releases:
1. **Target-Group A/B:** Scale different target groups to shift workload
2. **Router-Based A/B:** Use traffic routers to split requests between variations
Each model has different use cases, trade-offs, and implementation requirements.
---
## Model 1: Target-Group A/B
Target-group A/B splits traffic by scaling different groups of targets. Suitable for worker services, background processors, and scenarios where sticky sessions are not required.
### Configuration
```typescript
interface TargetGroupABConfig {
type: "target-group";
// Group definitions
groupA: {
targetGroupId: UUID;
labels?: Record<string, string>;
};
groupB: {
targetGroupId: UUID;
labels?: Record<string, string>;
};
// Rollout by scaling groups
rolloutStrategy: {
type: "scale-groups";
stages: ScaleStage[];
};
}
interface ScaleStage {
name: string;
groupAPercentage: number; // Percentage of group A targets active
groupBPercentage: number; // Percentage of group B targets active
duration?: number; // Auto-advance after duration (seconds)
healthThreshold?: number; // Required health % to advance
requireApproval?: boolean;
}
```
### Example: Worker Service Canary
```typescript
const workerCanaryConfig: TargetGroupABConfig = {
type: "target-group",
groupA: { labels: { "worker-group": "A" } },
groupB: { labels: { "worker-group": "B" } },
rolloutStrategy: {
type: "scale-groups",
stages: [
// Stage 1: 100% A, 10% B (canary)
{ name: "canary", groupAPercentage: 100, groupBPercentage: 10,
duration: 300, healthThreshold: 95 },
// Stage 2: 100% A, 50% B
{ name: "expand", groupAPercentage: 100, groupBPercentage: 50,
duration: 600, healthThreshold: 95 },
// Stage 3: 50% A, 100% B
{ name: "shift", groupAPercentage: 50, groupBPercentage: 100,
duration: 600, healthThreshold: 95 },
// Stage 4: 0% A, 100% B (complete)
{ name: "complete", groupAPercentage: 0, groupBPercentage: 100,
requireApproval: true },
],
},
};
```
### Use Cases
- Background job processors
- Worker services without external traffic
- Infrastructure-level splitting
- Static traffic distribution
- Hardware-based variants
---
## Model 2: Router-Based A/B
Router-based A/B uses traffic routers (Nginx, HAProxy, ALB) to split incoming requests between variations. Suitable for APIs, web services, and scenarios requiring sticky sessions.
### Configuration
```typescript
interface RouterBasedABConfig {
type: "router-based";
// Router integration
routerIntegrationId: UUID;
// Upstream configuration
upstreamName: string;
variationA: {
targets: string[];
serviceName?: string;
};
variationB: {
targets: string[];
serviceName?: string;
};
// Traffic split configuration
trafficSplit: TrafficSplitConfig;
// Rollout strategy
rolloutStrategy: RouterRolloutStrategy;
}
interface TrafficSplitConfig {
type: "weight" | "header" | "cookie" | "tenant" | "composite";
// Weight-based (percentage)
weights?: { A: number; B: number };
// Header-based
headerName?: string;
headerValueA?: string;
headerValueB?: string;
// Cookie-based
cookieName?: string;
cookieValueA?: string;
cookieValueB?: string;
// Tenant-based (by host/path)
tenantRules?: TenantRule[];
}
```
### Rollout Strategy
```typescript
interface RouterRolloutStrategy {
type: "manual" | "time-based" | "health-based" | "composite";
stages: RouterRolloutStage[];
}
interface RouterRolloutStage {
name: string;
trafficPercentageB: number; // % of traffic to variation B
// Advancement criteria
duration?: number; // Auto-advance after duration
healthThreshold?: number; // Required health %
errorRateThreshold?: number; // Max error rate %
latencyThreshold?: number; // Max p99 latency ms
requireApproval?: boolean;
// Optional: specific routing rules for this stage
routingOverrides?: TrafficSplitConfig;
}
```
### Example: API Canary with Health-Based Advancement
```typescript
const apiCanaryConfig: RouterBasedABConfig = {
type: "router-based",
routerIntegrationId: "nginx-prod",
upstreamName: "api-backend",
variationA: { serviceName: "api-v1" },
variationB: { serviceName: "api-v2" },
trafficSplit: { type: "weight", weights: { A: 100, B: 0 } },
rolloutStrategy: {
type: "health-based",
stages: [
{ name: "canary-10", trafficPercentageB: 10,
duration: 300, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "canary-25", trafficPercentageB: 25,
duration: 600, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "canary-50", trafficPercentageB: 50,
duration: 900, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "promote", trafficPercentageB: 100,
requireApproval: true },
],
},
};
```
### Use Cases
- API services with external traffic
- Web applications with user sessions
- Dynamic traffic distribution
- User-based variants (A/B testing)
- Feature flags and gradual rollouts
---
## Routing Strategies
### Weight-Based Routing
Splits traffic by percentage across variations.
```yaml
trafficSplit:
type: weight
weights:
A: 90
B: 10
```
### Header-Based Routing
Routes based on request header values.
```yaml
trafficSplit:
type: header
headerName: X-Feature-Flag
headerValueA: "control"
headerValueB: "experiment"
```
### Cookie-Based Routing
Routes based on cookie values for sticky sessions.
```yaml
trafficSplit:
type: cookie
cookieName: ab_variation
cookieValueA: "A"
cookieValueB: "B"
```
---
## Comparison Matrix
| Aspect | Target-Group A/B | Router-Based A/B |
|--------|------------------|------------------|
| **Traffic Control** | By scaling targets | By routing rules |
| **Sticky Sessions** | Not supported | Supported |
| **Granularity** | Target-level | Request-level |
| **External Traffic** | Not required | Required |
| **Infrastructure** | Target groups | Traffic router |
| **Use Case** | Workers, batch jobs | APIs, web apps |
| **Rollback Speed** | Slower (scaling) | Immediate (routing) |
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [Canary Controller](canary.md)
- [Router Plugins](routers.md)
- [Deployment Strategies](../deployment/strategies.md)

View File

@@ -0,0 +1,270 @@
# Canary Controller
> Automated canary deployment controller with health-based stage advancement and automatic rollback.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Deployment Strategies](../deployment/strategies.md)
**Sprint:** [110_003 Canary Controller](../../../../implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md)
## Overview
The Canary Controller automates progressive rollout of new versions by gradually shifting traffic, monitoring health metrics, and automatically rolling back if issues are detected.
---
## Canary State Machine
### States
```
CREATED -> DEPLOYING -> EVALUATING -> PROMOTING/ROLLING_BACK -> COMPLETED
```
| State | Description |
|-------|-------------|
| `CREATED` | Canary release defined, not started |
| `DEPLOYING` | Deploying variation B to targets |
| `EVALUATING` | Monitoring health metrics at current stage |
| `PROMOTING` | Advancing to next stage |
| `ROLLING_BACK` | Reverting to variation A |
| `COMPLETED` | Final state (promoted or rolled back) |
---
## Implementation
### Canary Controller Class
```typescript
class CanaryController {
async executeRollout(abRelease: ABRelease): Promise<void> {
const strategy = abRelease.rolloutStrategy;
for (let i = 0; i < strategy.stages.length; i++) {
const stage = strategy.stages[i];
const stageRecord = await this.startStage(abRelease, stage, i);
try {
// 1. Apply traffic configuration for this stage
await this.applyStageTraffic(abRelease, stage);
this.emit("canary.stage_started", { abRelease, stage, stageNumber: i });
// 2. Wait for stage completion based on criteria
const result = await this.waitForStageCompletion(abRelease, stage);
if (!result.success) {
// Health check failed - rollback
this.log(`Stage ${stage.name} failed health check: ${result.reason}`);
await this.rollback(abRelease, result.reason);
return;
}
// 3. Check if approval required
if (stage.requireApproval) {
this.log(`Stage ${stage.name} requires approval`);
await this.pauseForApproval(abRelease, stage);
// Wait for approval
const approval = await this.waitForApproval(abRelease, stage);
if (!approval.approved) {
await this.rollback(abRelease, "Approval denied");
return;
}
}
await this.completeStage(stageRecord, "succeeded");
this.emit("canary.stage_completed", { abRelease, stage, stageNumber: i });
} catch (error) {
await this.completeStage(stageRecord, "failed", error.message);
await this.rollback(abRelease, error.message);
return;
}
}
// Rollout complete
await this.completeRollout(abRelease);
this.emit("canary.promoted", { abRelease });
}
}
```
### Stage Completion Logic
```typescript
private async waitForStageCompletion(
abRelease: ABRelease,
stage: RolloutStage
): Promise<StageCompletionResult> {
const startTime = Date.now();
const checkInterval = 30000; // 30 seconds
while (true) {
// Check health metrics
const health = await this.checkHealth(abRelease, stage);
if (!health.healthy) {
return {
success: false,
reason: `Health check failed: ${health.reason}`
};
}
// Check error rate (if threshold configured)
if (stage.errorRateThreshold !== undefined) {
const errorRate = await this.getErrorRate(abRelease);
if (errorRate > stage.errorRateThreshold) {
return {
success: false,
reason: `Error rate ${errorRate}% exceeds threshold ${stage.errorRateThreshold}%`
};
}
}
// Check latency (if threshold configured)
if (stage.latencyThreshold !== undefined) {
const latency = await this.getP99Latency(abRelease);
if (latency > stage.latencyThreshold) {
return {
success: false,
reason: `P99 latency ${latency}ms exceeds threshold ${stage.latencyThreshold}ms`
};
}
}
// Check duration (auto-advance)
if (stage.duration !== undefined) {
const elapsed = (Date.now() - startTime) / 1000;
if (elapsed >= stage.duration) {
return { success: true };
}
}
// Wait before next check
await sleep(checkInterval);
}
}
```
### Traffic Application
```typescript
private async applyStageTraffic(abRelease: ABRelease, stage: RolloutStage): Promise<void> {
if (abRelease.config.type === "router-based") {
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
await router.shiftTraffic(
abRelease.config.variationA.serviceName,
abRelease.config.variationB.serviceName,
stage.trafficPercentageB
);
} else if (abRelease.config.type === "target-group") {
// Scale target groups
await this.scaleTargetGroup(
abRelease.config.groupA,
stage.groupAPercentage
);
await this.scaleTargetGroup(
abRelease.config.groupB,
stage.groupBPercentage
);
}
}
```
### Rollback
```typescript
async rollback(abRelease: ABRelease, reason: string): Promise<void> {
this.log(`Rolling back A/B release: ${reason}`);
this.emit("canary.rollback_started", { abRelease, reason });
if (abRelease.config.type === "router-based") {
// Shift all traffic back to A
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
await router.shiftTraffic(
abRelease.config.variationB.serviceName,
abRelease.config.variationA.serviceName,
100
);
} else if (abRelease.config.type === "target-group") {
// Scale B to 0, A to 100
await this.scaleTargetGroup(abRelease.config.groupA, 100);
await this.scaleTargetGroup(abRelease.config.groupB, 0);
}
abRelease.status = "rolled_back";
await this.save(abRelease);
this.emit("canary.rolled_back", { abRelease, reason });
}
```
---
## Configuration
### Canary Stages
```yaml
rolloutStrategy:
type: health-based
stages:
- name: canary-5
trafficPercentageB: 5
duration: 300 # 5 minutes
healthThreshold: 99
errorRateThreshold: 0.5
- name: canary-25
trafficPercentageB: 25
duration: 600 # 10 minutes
healthThreshold: 99
errorRateThreshold: 1.0
- name: canary-50
trafficPercentageB: 50
duration: 900 # 15 minutes
healthThreshold: 99
errorRateThreshold: 1.0
- name: promote
trafficPercentageB: 100
requireApproval: true
```
### Health Metrics
| Metric | Description | Typical Threshold |
|--------|-------------|-------------------|
| Success Rate | % of successful requests | > 99% |
| Error Rate | % of failed requests | < 1% |
| P99 Latency | 99th percentile response time | < 500ms |
| Health Check | Container/service health | Healthy |
---
## Events
The canary controller emits events for observability:
| Event | Description |
|-------|-------------|
| `canary.stage_started` | Stage execution began |
| `canary.stage_completed` | Stage completed successfully |
| `canary.rollback_started` | Rollback initiated |
| `canary.rolled_back` | Rollback completed |
| `canary.promoted` | Full promotion completed |
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [A/B Release Models](ab-releases.md)
- [Router Plugins](routers.md)
- [Metrics](../operations/metrics.md)

View File

@@ -0,0 +1,348 @@
# Router Plugins
> Traffic router plugins for progressive delivery (Nginx, AWS ALB, and custom implementations).
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Plugin System](../modules/plugin-system.md)
**Sprint:** [110_004 Router Plugins](../../../../implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md)
## Overview
Router plugins enable traffic shifting for progressive delivery. The orchestrator ships with an Nginx router plugin for v1, with HAProxy, Traefik, and AWS ALB available as additional plugins.
---
## Router Plugin Interface
All router plugins implement the `TrafficRouterPlugin` interface:
```typescript
interface TrafficRouterPlugin {
// Configuration
configureRoute(config: RouteConfig): Promise<void>;
// Traffic operations
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
getTrafficDistribution(): Promise<TrafficDistribution>;
// Health
validateConfig(): Promise<ValidationResult>;
reload(): Promise<void>;
}
interface RouteConfig {
upstream: string;
serverName: string;
variations: Variation[];
splitType: "weight" | "header" | "cookie";
headerName?: string;
headerValueB?: string;
stickySession?: boolean;
stickyDuration?: number;
}
interface Variation {
name: string;
targets: string[];
weight: number;
}
interface TrafficDistribution {
variations: {
name: string;
percentage: number;
targets: string[];
}[];
}
```
---
## Nginx Router Plugin (v1 Built-in)
The Nginx plugin generates and manages Nginx configuration for traffic splitting.
### Implementation
```typescript
class NginxRouterPlugin implements TrafficRouterPlugin {
async configureRoute(config: RouteConfig): Promise<void> {
const upstreamConfig = this.generateUpstreamConfig(config);
const serverConfig = this.generateServerConfig(config);
// Write configuration files
await this.writeConfig(
`/etc/nginx/conf.d/upstream-${config.upstream}.conf`,
upstreamConfig
);
await this.writeConfig(
`/etc/nginx/conf.d/server-${config.upstream}.conf`,
serverConfig
);
// Validate configuration
const validation = await this.validateConfig();
if (!validation.valid) {
throw new Error(`Nginx config validation failed: ${validation.error}`);
}
// Reload nginx
await this.reload();
}
}
```
### Upstream Configuration
```typescript
private generateUpstreamConfig(config: RouteConfig): string {
const lines: string[] = [];
for (const variation of config.variations) {
lines.push(`upstream ${config.upstream}_${variation.name} {`);
for (const target of variation.targets) {
lines.push(` server ${target};`);
}
lines.push(`}`);
lines.push(``);
}
// Combined upstream with weights (for percentage-based routing)
if (config.splitType === "weight") {
lines.push(`upstream ${config.upstream} {`);
for (const variation of config.variations) {
const weight = variation.weight;
for (const target of variation.targets) {
lines.push(` server ${target} weight=${weight};`);
}
}
lines.push(`}`);
}
return lines.join("\n");
}
```
### Server Configuration
```typescript
private generateServerConfig(config: RouteConfig): string {
if (config.splitType === "header" || config.splitType === "cookie") {
// Split block based on header/cookie
return `
map $http_${config.headerName || "x-variation"} $${config.upstream}_backend {
default ${config.upstream}_A;
"${config.headerValueB || "B"}" ${config.upstream}_B;
}
server {
listen 80;
server_name ${config.serverName};
location / {
proxy_pass http://$${config.upstream}_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
`;
} else {
// Weight-based (default)
return `
server {
listen 80;
server_name ${config.serverName};
location / {
proxy_pass http://${config.upstream};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
`;
}
}
```
### Traffic Shifting
```typescript
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
const config = await this.getCurrentConfig();
// Update weights
for (const variation of config.variations) {
if (variation.name === to) {
variation.weight = percentage;
} else {
variation.weight = 100 - percentage;
}
}
await this.configureRoute(config);
}
async getTrafficDistribution(): Promise<TrafficDistribution> {
// Parse current nginx config to get weights
const config = await this.parseCurrentConfig();
return {
variations: config.variations.map(v => ({
name: v.name,
percentage: v.weight,
targets: v.targets,
})),
};
}
```
---
## AWS ALB Router Plugin
The AWS ALB plugin manages weighted target groups for traffic splitting.
### Implementation
```typescript
class AWSALBRouterPlugin implements TrafficRouterPlugin {
private alb: AWS.ELBv2;
async configureRoute(config: RouteConfig): Promise<void> {
const listenerArn = config.listenerArn;
// Create/update target groups for each variation
const targetGroupArns: Record<string, string> = {};
for (const variation of config.variations) {
const tgArn = await this.ensureTargetGroup(
`${config.upstream}-${variation.name}`,
variation.targets
);
targetGroupArns[variation.name] = tgArn;
}
// Update listener rule with weighted target groups
await this.alb.modifyRule({
RuleArn: config.ruleArn,
Actions: [{
Type: "forward",
ForwardConfig: {
TargetGroups: config.variations.map(v => ({
TargetGroupArn: targetGroupArns[v.name],
Weight: v.weight,
})),
TargetGroupStickinessConfig: {
Enabled: config.stickySession || false,
DurationSeconds: config.stickyDuration || 3600,
},
},
}],
}).promise();
}
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
const rule = await this.getRule();
const forwardConfig = rule.Actions[0].ForwardConfig;
// Update weights
for (const tg of forwardConfig.TargetGroups) {
if (tg.TargetGroupArn.includes(`-${to}`)) {
tg.Weight = percentage;
} else {
tg.Weight = 100 - percentage;
}
}
await this.alb.modifyRule({
RuleArn: rule.RuleArn,
Actions: rule.Actions,
}).promise();
}
async getTrafficDistribution(): Promise<TrafficDistribution> {
const rule = await this.getRule();
const forwardConfig = rule.Actions[0].ForwardConfig;
const variations = [];
for (const tg of forwardConfig.TargetGroups) {
const targets = await this.getTargetGroupTargets(tg.TargetGroupArn);
const name = tg.TargetGroupArn.split("-").pop();
variations.push({
name,
percentage: tg.Weight,
targets: targets.map(t => t.Id),
});
}
return { variations };
}
}
```
---
## Router Plugin Catalog
| Plugin | Status | Description |
|--------|--------|-------------|
| Nginx | v1 Built-in | Configuration-based weight/header routing |
| HAProxy | Plugin | Runtime API for traffic management |
| Traefik | Plugin | Dynamic configuration via API |
| AWS ALB | Plugin | Weighted target groups |
| Envoy | Planned | xDS API integration |
---
## Creating Custom Router Plugins
To create a custom router plugin:
1. **Implement Interface:** Create a class implementing `TrafficRouterPlugin`
2. **Register Plugin:** Add to plugin registry with capabilities
3. **Configuration Schema:** Define JSON Schema for plugin config
4. **Health Checks:** Implement connection testing
5. **Rollback Support:** Handle traffic reversion on failures
### Example Plugin Manifest
```yaml
plugin:
name: my-router
version: 1.0.0
type: router
capabilities:
- traffic-routing
- weight-based
- header-based
config:
type: object
properties:
endpoint:
type: string
description: Router API endpoint
auth:
type: object
properties:
type:
enum: [basic, token]
credentialRef:
type: string
```
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [Plugin System](../modules/plugin-system.md)
- [Canary Controller](canary.md)
- [A/B Release Models](ab-releases.md)