Files

6.8 KiB

A/B Release Models

Two models for A/B releases: target-group based and router-based traffic splitting.

Status: Planned (not yet implemented) Source: Architecture Advisory Section 11.2 Related Modules: Progressive Delivery Module, Traffic Router Sprint: 110_001 A/B Release Manager

Overview

Stella Ops supports two distinct models for A/B releases:

  1. Target-Group A/B: Scale different target groups to shift workload
  2. Router-Based A/B: Use traffic routers to split requests between variations

Each model has different use cases, trade-offs, and implementation requirements.


Model 1: Target-Group A/B

Target-group A/B splits traffic by scaling different groups of targets. Suitable for worker services, background processors, and scenarios where sticky sessions are not required.

Configuration

interface TargetGroupABConfig {
  type: "target-group";

  // Group definitions
  groupA: {
    targetGroupId: UUID;
    labels?: Record<string, string>;
  };
  groupB: {
    targetGroupId: UUID;
    labels?: Record<string, string>;
  };

  // Rollout by scaling groups
  rolloutStrategy: {
    type: "scale-groups";
    stages: ScaleStage[];
  };
}

interface ScaleStage {
  name: string;
  groupAPercentage: number;   // Percentage of group A targets active
  groupBPercentage: number;   // Percentage of group B targets active
  duration?: number;          // Auto-advance after duration (seconds)
  healthThreshold?: number;   // Required health % to advance
  requireApproval?: boolean;
}

Example: Worker Service Canary

const workerCanaryConfig: TargetGroupABConfig = {
  type: "target-group",
  groupA: { labels: { "worker-group": "A" } },
  groupB: { labels: { "worker-group": "B" } },
  rolloutStrategy: {
    type: "scale-groups",
    stages: [
      // Stage 1: 100% A, 10% B (canary)
      { name: "canary", groupAPercentage: 100, groupBPercentage: 10,
        duration: 300, healthThreshold: 95 },
      // Stage 2: 100% A, 50% B
      { name: "expand", groupAPercentage: 100, groupBPercentage: 50,
        duration: 600, healthThreshold: 95 },
      // Stage 3: 50% A, 100% B
      { name: "shift", groupAPercentage: 50, groupBPercentage: 100,
        duration: 600, healthThreshold: 95 },
      // Stage 4: 0% A, 100% B (complete)
      { name: "complete", groupAPercentage: 0, groupBPercentage: 100,
        requireApproval: true },
    ],
  },
};

Use Cases

  • Background job processors
  • Worker services without external traffic
  • Infrastructure-level splitting
  • Static traffic distribution
  • Hardware-based variants

Model 2: Router-Based A/B

Router-based A/B uses traffic routers (Nginx, HAProxy, ALB) to split incoming requests between variations. Suitable for APIs, web services, and scenarios requiring sticky sessions.

Configuration

interface RouterBasedABConfig {
  type: "router-based";

  // Router integration
  routerIntegrationId: UUID;

  // Upstream configuration
  upstreamName: string;
  variationA: {
    targets: string[];
    serviceName?: string;
  };
  variationB: {
    targets: string[];
    serviceName?: string;
  };

  // Traffic split configuration
  trafficSplit: TrafficSplitConfig;

  // Rollout strategy
  rolloutStrategy: RouterRolloutStrategy;
}

interface TrafficSplitConfig {
  type: "weight" | "header" | "cookie" | "tenant" | "composite";

  // Weight-based (percentage)
  weights?: { A: number; B: number };

  // Header-based
  headerName?: string;
  headerValueA?: string;
  headerValueB?: string;

  // Cookie-based
  cookieName?: string;
  cookieValueA?: string;
  cookieValueB?: string;

  // Tenant-based (by host/path)
  tenantRules?: TenantRule[];
}

Rollout Strategy

interface RouterRolloutStrategy {
  type: "manual" | "time-based" | "health-based" | "composite";
  stages: RouterRolloutStage[];
}

interface RouterRolloutStage {
  name: string;
  trafficPercentageB: number;     // % of traffic to variation B

  // Advancement criteria
  duration?: number;              // Auto-advance after duration
  healthThreshold?: number;       // Required health %
  errorRateThreshold?: number;    // Max error rate %
  latencyThreshold?: number;      // Max p99 latency ms
  requireApproval?: boolean;

  // Optional: specific routing rules for this stage
  routingOverrides?: TrafficSplitConfig;
}

Example: API Canary with Health-Based Advancement

const apiCanaryConfig: RouterBasedABConfig = {
  type: "router-based",
  routerIntegrationId: "nginx-prod",
  upstreamName: "api-backend",
  variationA: { serviceName: "api-v1" },
  variationB: { serviceName: "api-v2" },
  trafficSplit: { type: "weight", weights: { A: 100, B: 0 } },
  rolloutStrategy: {
    type: "health-based",
    stages: [
      { name: "canary-10", trafficPercentageB: 10,
        duration: 300, healthThreshold: 99, errorRateThreshold: 1 },
      { name: "canary-25", trafficPercentageB: 25,
        duration: 600, healthThreshold: 99, errorRateThreshold: 1 },
      { name: "canary-50", trafficPercentageB: 50,
        duration: 900, healthThreshold: 99, errorRateThreshold: 1 },
      { name: "promote", trafficPercentageB: 100,
        requireApproval: true },
    ],
  },
};

Use Cases

  • API services with external traffic
  • Web applications with user sessions
  • Dynamic traffic distribution
  • User-based variants (A/B testing)
  • Feature flags and gradual rollouts

Routing Strategies

Weight-Based Routing

Splits traffic by percentage across variations.

trafficSplit:
  type: weight
  weights:
    A: 90
    B: 10

Header-Based Routing

Routes based on request header values.

trafficSplit:
  type: header
  headerName: X-Feature-Flag
  headerValueA: "control"
  headerValueB: "experiment"

Routes based on cookie values for sticky sessions.

trafficSplit:
  type: cookie
  cookieName: ab_variation
  cookieValueA: "A"
  cookieValueB: "B"

Comparison Matrix

Aspect Target-Group A/B Router-Based A/B
Traffic Control By scaling targets By routing rules
Sticky Sessions Not supported Supported
Granularity Target-level Request-level
External Traffic Not required Required
Infrastructure Target groups Traffic router
Use Case Workers, batch jobs APIs, web apps
Rollback Speed Slower (scaling) Immediate (routing)

See Also