# Multi-Region / Federation ## Overview Multi-Region Federation extends the Release Orchestrator to support geographically distributed deployments across multiple regions, data centers, and cloud providers. This enhancement provides cross-region promotion orchestration, region-aware agent assignment, evidence replication, and federated release management. This is a best-in-class implementation that enables global enterprises to manage releases across their entire infrastructure while maintaining consistency, compliance, and operational control. --- ## Design Principles 1. **Region Autonomy**: Each region operates independently; central coordination doesn't create dependencies 2. **Eventual Consistency**: Regions sync state asynchronously; local operations never blocked by remote failures 3. **Data Sovereignty**: Evidence and audit logs respect regional data residency requirements 4. **Blast Radius Isolation**: Regional failures don't cascade to other regions 5. **Global Visibility**: Single pane of glass for cross-region release status 6. **Configurable Latency**: Trade-off between consistency and performance --- ## Architecture ### Component Overview ``` ┌────────────────────────────────────────────────────────────────────────┐ │ Multi-Region Federation │ ├────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ │ │ FederationHub │───▶│ RegionCoordinator │───▶│ CrossRegionSync │ │ │ │ │ │ │ │ │ │ │ └──────────────────┘ └───────────────────┘ └─────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ │ │ RegionRegistry │ │ PromotionOrch │ │ EvidenceRepl │ │ │ │ │ │ │ │ │ │ │ └──────────────────┘ └───────────────────┘ └─────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ │ │ LatencyRouter │ │ ConflictResolver │ │ GlobalDashboard │ │ │ │ │ │ │ │ │ │ │ └──────────────────┘ └───────────────────┘ └─────────────────┘ │ │ │ └────────────────────────────────────────────────────────────────────────┘ Federation Topology ┌─────────────────┐ ┌─────────────────┐ │ Region: US │◄───────▶│ Region: EU │ │ (Primary) │ │ (Secondary) │ └────────┬────────┘ └────────┬────────┘ │ │ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Region: APAC │◄───────▶│ Region: LATAM │ │ (Secondary) │ │ (Secondary) │ └─────────────────┘ └─────────────────┘ ``` ### Key Components #### 1. FederationHub Central coordination point for multi-region operations: ```csharp public sealed class FederationHub { private readonly IRegionRegistry _regionRegistry; private readonly ICrossRegionSync _sync; public async Task CreateFederationAsync( FederationConfig config, CancellationToken ct) { var federation = new Federation { Id = Guid.NewGuid(), Name = config.Name, PrimaryRegionId = config.PrimaryRegionId, Regions = config.Regions, SyncPolicy = config.SyncPolicy, ConflictPolicy = config.ConflictPolicy, CreatedAt = _timeProvider.GetUtcNow() }; // Register with all regions foreach (var region in config.Regions) { await RegisterFederationWithRegionAsync(federation, region, ct); } await _federationStore.SaveAsync(federation, ct); return federation; } public async Task GetStatusAsync( Guid federationId, CancellationToken ct) { var federation = await _federationStore.GetAsync(federationId, ct); var status = new FederationStatus { FederationId = federationId, CheckedAt = _timeProvider.GetUtcNow() }; // Query each region await Parallel.ForEachAsync(federation.Regions, ct, async (region, ct) => { try { var regionStatus = await GetRegionStatusAsync(region, ct); status.RegionStatuses[region.Id] = regionStatus; } catch (Exception ex) { status.RegionStatuses[region.Id] = new RegionStatus { RegionId = region.Id, Status = RegionHealthStatus.Unreachable, Error = ex.Message }; } }); // Calculate overall health status.OverallHealth = CalculateOverallHealth(status.RegionStatuses.Values); status.SyncLag = CalculateSyncLag(status.RegionStatuses.Values); return status; } public async Task CreateGlobalReleaseAsync( GlobalReleaseConfig config, CancellationToken ct) { var globalRelease = new GlobalRelease { Id = Guid.NewGuid(), FederationId = config.FederationId, Name = config.Name, Version = config.Version, Components = config.Components, RegionalOverrides = config.RegionalOverrides, RolloutStrategy = config.RolloutStrategy, CreatedAt = _timeProvider.GetUtcNow(), Status = GlobalReleaseStatus.Draft }; // Create regional release records var federation = await _federationStore.GetAsync(config.FederationId, ct); foreach (var region in federation.Regions) { var regionalRelease = CreateRegionalRelease(globalRelease, region); globalRelease.RegionalReleases[region.Id] = regionalRelease; } await _globalReleaseStore.SaveAsync(globalRelease, ct); return globalRelease; } } public sealed record Federation { public Guid Id { get; init; } public string Name { get; init; } public Guid PrimaryRegionId { get; init; } public ImmutableArray Regions { get; init; } public SyncPolicy SyncPolicy { get; init; } public ConflictPolicy ConflictPolicy { get; init; } public DateTimeOffset CreatedAt { get; init; } } public sealed record GlobalRelease { public Guid Id { get; init; } public Guid FederationId { get; init; } public string Name { get; init; } public string Version { get; init; } public GlobalReleaseStatus Status { get; init; } // Components public ImmutableArray Components { get; init; } public ImmutableDictionary RegionalOverrides { get; init; } // Rollout public GlobalRolloutStrategy RolloutStrategy { get; init; } public ImmutableDictionary RegionalReleases { get; init; } // Timing public DateTimeOffset CreatedAt { get; init; } public DateTimeOffset? StartedAt { get; init; } public DateTimeOffset? CompletedAt { get; init; } } ``` #### 2. RegionCoordinator Coordinates operations across regions: ```csharp public sealed class RegionCoordinator { public async Task PromoteGloballyAsync( GlobalPromotionRequest request, CancellationToken ct) { var globalRelease = await _globalReleaseStore.GetAsync(request.GlobalReleaseId, ct); var federation = await _federationStore.GetAsync(globalRelease.FederationId, ct); var result = new GlobalPromotionResult { RequestId = Guid.NewGuid(), GlobalReleaseId = request.GlobalReleaseId, StartedAt = _timeProvider.GetUtcNow() }; // Determine promotion order based on strategy var promotionOrder = DeterminePromotionOrder( federation.Regions, globalRelease.RolloutStrategy); foreach (var wave in promotionOrder) { _logger.LogInformation( "Starting promotion wave {Wave} for regions: {Regions}", wave.Order, string.Join(", ", wave.Regions.Select(r => r.Name))); // Promote regions in this wave concurrently var waveResults = await PromoteWaveAsync(globalRelease, wave, ct); result.WaveResults.Add(wave.Order, waveResults); // Check for failures if (waveResults.Any(r => r.Status == RegionalPromotionStatus.Failed)) { if (globalRelease.RolloutStrategy.StopOnFailure) { result.Status = GlobalPromotionStatus.PartialFailure; result.FailedAt = _timeProvider.GetUtcNow(); result.FailureReason = "Regional promotion failed, stopping rollout"; return result; } } // Wait for wave stabilization if (wave.StabilizationPeriod.HasValue) { await Task.Delay(wave.StabilizationPeriod.Value, ct); } } result.Status = GlobalPromotionStatus.Succeeded; result.CompletedAt = _timeProvider.GetUtcNow(); return result; } private ImmutableArray DeterminePromotionOrder( ImmutableArray regions, GlobalRolloutStrategy strategy) { return strategy.Type switch { GlobalRolloutType.Sequential => regions.Select((r, i) => new PromotionWave { Order = i, Regions = ImmutableArray.Create(r), StabilizationPeriod = strategy.StabilizationPeriod }).ToImmutableArray(), GlobalRolloutType.Parallel => ImmutableArray.Create(new PromotionWave { Order = 0, Regions = regions, StabilizationPeriod = null }), GlobalRolloutType.Canary => CreateCanaryWaves(regions, strategy), GlobalRolloutType.FollowTheSun => CreateFollowTheSunWaves(regions), GlobalRolloutType.Custom => strategy.CustomWaves ?? throw new InvalidOperationException("Custom waves not defined"), _ => throw new UnsupportedStrategyException(strategy.Type) }; } private ImmutableArray CreateCanaryWaves( ImmutableArray regions, GlobalRolloutStrategy strategy) { var canaryRegion = regions.FirstOrDefault(r => r.IsCanary) ?? regions.First(); var remainingRegions = regions.Where(r => r.Id != canaryRegion.Id).ToImmutableArray(); return ImmutableArray.Create( new PromotionWave { Order = 0, Regions = ImmutableArray.Create(canaryRegion), StabilizationPeriod = strategy.CanaryStabilizationPeriod }, new PromotionWave { Order = 1, Regions = remainingRegions, StabilizationPeriod = strategy.StabilizationPeriod } ); } private async Task> PromoteWaveAsync( GlobalRelease globalRelease, PromotionWave wave, CancellationToken ct) { var results = new ConcurrentBag(); await Parallel.ForEachAsync(wave.Regions, ct, async (region, ct) => { var regionalRelease = globalRelease.RegionalReleases[region.Id]; var result = await PromoteRegionallyAsync(region, regionalRelease, ct); results.Add(result); }); return results.ToImmutableArray(); } } public enum GlobalRolloutType { Sequential, // One region at a time Parallel, // All regions simultaneously Canary, // Canary region first, then all others FollowTheSun, // Based on timezone/business hours Custom // User-defined waves } ``` #### 3. CrossRegionSync Handles data synchronization across regions: ```csharp public sealed class CrossRegionSync { private readonly IRegionConnectionPool _connectionPool; public async Task SyncAsync( SyncRequest request, CancellationToken ct) { var federation = await _federationStore.GetAsync(request.FederationId, ct); var sourceRegion = federation.Regions.First(r => r.Id == request.SourceRegionId); // Get changes since last sync var changes = await GetChangesSinceAsync( sourceRegion, request.SinceTimestamp, ct); if (!changes.Any()) { _logger.LogDebug("No changes to sync from {Region}", sourceRegion.Name); return; } // Sync to target regions var targetRegions = federation.Regions .Where(r => r.Id != request.SourceRegionId) .Where(r => request.TargetRegionIds?.Contains(r.Id) ?? true); foreach (var targetRegion in targetRegions) { await SyncToRegionAsync(changes, targetRegion, federation.SyncPolicy, ct); } } private async Task SyncToRegionAsync( IReadOnlyList changes, RegionConfig targetRegion, SyncPolicy policy, CancellationToken ct) { var connection = await _connectionPool.GetConnectionAsync(targetRegion, ct); try { foreach (var change in changes) { // Check for conflicts var conflict = await CheckForConflictAsync(connection, change, ct); if (conflict != null) { var resolution = await ResolveConflictAsync(conflict, policy, ct); if (resolution.Action == ConflictAction.Skip) continue; change = ApplyResolution(change, resolution); } // Apply change await ApplyChangeAsync(connection, change, ct); } } finally { _connectionPool.ReturnConnection(connection); } } private async Task CheckForConflictAsync( IRegionConnection connection, SyncChange change, CancellationToken ct) { var existingRecord = await connection.GetByIdAsync(change.EntityType, change.EntityId, ct); if (existingRecord == null) return null; // Check version/timestamp if (existingRecord.Version > change.Version) { return new SyncConflict { Change = change, ExistingRecord = existingRecord, ConflictType = ConflictType.VersionConflict }; } if (existingRecord.ModifiedAt > change.Timestamp) { return new SyncConflict { Change = change, ExistingRecord = existingRecord, ConflictType = ConflictType.ConcurrentModification }; } return null; } } public sealed record SyncPolicy { public SyncMode Mode { get; init; } public TimeSpan SyncInterval { get; init; } public int MaxBatchSize { get; init; } public bool SyncEvidence { get; init; } public bool SyncAuditLogs { get; init; } public ConflictResolutionStrategy ConflictStrategy { get; init; } public DataResidencyPolicy DataResidency { get; init; } } public enum SyncMode { RealTime, // Immediate sync on changes Scheduled, // Periodic sync OnDemand, // Manual sync only EventDriven // Sync on specific events } public enum ConflictResolutionStrategy { PrimaryWins, // Primary region always wins LastWriteWins, // Most recent modification wins MergeFields, // Merge non-conflicting fields ManualReview // Queue for human review } ``` #### 4. EvidenceReplicator Replicates evidence across regions with data residency compliance: ```csharp public sealed class EvidenceReplicator { public async Task ReplicateEvidenceAsync( EvidencePacket evidence, Federation federation, CancellationToken ct) { var sourceRegion = await DetermineSourceRegionAsync(evidence, ct); var replicationPlan = await CreateReplicationPlanAsync( evidence, federation, sourceRegion, ct); foreach (var target in replicationPlan.Targets) { try { await ReplicateToRegionAsync(evidence, target, replicationPlan.Policy, ct); } catch (Exception ex) { _logger.LogError(ex, "Failed to replicate evidence {EvidenceId} to region {RegionId}", evidence.Id, target.RegionId); // Queue for retry if required if (replicationPlan.Policy.RequireAllRegions) { await QueueForRetryAsync(evidence, target, ct); } } } } private async Task CreateReplicationPlanAsync( EvidencePacket evidence, Federation federation, RegionConfig sourceRegion, CancellationToken ct) { var plan = new EvidenceReplicationPlan { EvidenceId = evidence.Id, SourceRegionId = sourceRegion.Id, Policy = federation.SyncPolicy }; foreach (var region in federation.Regions.Where(r => r.Id != sourceRegion.Id)) { // Check data residency requirements var residencyCheck = await CheckDataResidencyAsync(evidence, region, ct); if (residencyCheck.Allowed) { plan.Targets.Add(new ReplicationTarget { RegionId = region.Id, ReplicationType = ReplicationType.Full }); } else if (residencyCheck.AllowRedacted) { plan.Targets.Add(new ReplicationTarget { RegionId = region.Id, ReplicationType = ReplicationType.Redacted, RedactionRules = residencyCheck.RedactionRules }); } else { _logger.LogInformation( "Evidence {EvidenceId} cannot be replicated to {Region} due to data residency", evidence.Id, region.Name); // Store reference only plan.Targets.Add(new ReplicationTarget { RegionId = region.Id, ReplicationType = ReplicationType.ReferenceOnly }); } } return plan; } private async Task ReplicateToRegionAsync( EvidencePacket evidence, ReplicationTarget target, SyncPolicy policy, CancellationToken ct) { var connection = await _connectionPool.GetConnectionAsync(target.RegionId, ct); var payload = target.ReplicationType switch { ReplicationType.Full => evidence, ReplicationType.Redacted => RedactEvidence(evidence, target.RedactionRules), ReplicationType.ReferenceOnly => CreateReference(evidence), _ => throw new InvalidOperationException() }; await connection.StoreEvidenceAsync(payload, ct); } private EvidencePacket RedactEvidence( EvidencePacket evidence, ImmutableArray rules) { var redacted = evidence with { Content = ApplyRedactionRules(evidence.Content, rules), Metadata = evidence.Metadata with { ["redacted"] = "true", ["redaction_rules"] = string.Join(",", rules.Select(r => r.Name)) } }; return redacted; } } public sealed record DataResidencyPolicy { public ImmutableDictionary Rules { get; init; } } public sealed record DataResidencyRule { public string DataType { get; init; } public ImmutableArray AllowedRegions { get; init; } public ImmutableArray BlockedRegions { get; init; } public bool AllowRedacted { get; init; } public ImmutableArray RedactionRules { get; init; } } ``` #### 5. LatencyRouter Routes requests to optimal regions: ```csharp public sealed class LatencyRouter { private readonly ConcurrentDictionary _latencyCache = new(); public async Task SelectOptimalRegionAsync( RoutingRequest request, CancellationToken ct) { var federation = await _federationStore.GetAsync(request.FederationId, ct); var candidates = FilterEligibleRegions(federation.Regions, request); if (!candidates.Any()) throw new NoEligibleRegionException(request); // Score each candidate var scored = new List<(RegionConfig Region, double Score)>(); foreach (var region in candidates) { var score = await CalculateRegionScoreAsync(region, request, ct); scored.Add((region, score)); } // Select best region var best = scored.OrderByDescending(s => s.Score).First(); _logger.LogDebug( "Selected region {RegionName} with score {Score} for request", best.Region.Name, best.Score); return best.Region; } private async Task CalculateRegionScoreAsync( RegionConfig region, RoutingRequest request, CancellationToken ct) { var score = 100.0; // Latency factor (40%) var latency = await GetLatencyAsync(region, ct); score -= (latency.AverageMs / 10) * 0.4; // Availability factor (30%) var availability = await GetAvailabilityAsync(region, ct); score *= availability * 0.3 + 0.7; // Load factor (20%) var load = await GetLoadAsync(region, ct); score -= (load * 100) * 0.2; // Affinity factor (10%) if (request.PreferredRegionId == region.Id) score += 10; return Math.Max(0, score); } public async Task> GetRegionsByLatencyAsync( Guid federationId, GeoLocation clientLocation, CancellationToken ct) { var federation = await _federationStore.GetAsync(federationId, ct); var withLatency = new List<(RegionConfig Region, double Distance)>(); foreach (var region in federation.Regions) { var distance = CalculateDistance(clientLocation, region.Location); withLatency.Add((region, distance)); } return withLatency .OrderBy(r => r.Distance) .Select(r => r.Region) .ToList(); } } ``` #### 6. GlobalDashboard Provides unified view across all regions: ```csharp public sealed class GlobalDashboard { public async Task GetOverviewAsync( Guid federationId, CancellationToken ct) { var federation = await _federationStore.GetAsync(federationId, ct); var overview = new GlobalOverview { FederationId = federationId, GeneratedAt = _timeProvider.GetUtcNow() }; // Query all regions in parallel var regionTasks = federation.Regions.Select(async region => { try { return await GetRegionOverviewAsync(region, ct); } catch (Exception ex) { return new RegionOverview { RegionId = region.Id, RegionName = region.Name, Status = RegionHealthStatus.Unreachable, Error = ex.Message }; } }); overview.RegionOverviews = (await Task.WhenAll(regionTasks)).ToImmutableArray(); // Aggregate metrics overview.TotalDeployments = overview.RegionOverviews.Sum(r => r.DeploymentCount); overview.TotalAgents = overview.RegionOverviews.Sum(r => r.AgentCount); overview.HealthyRegions = overview.RegionOverviews.Count(r => r.Status == RegionHealthStatus.Healthy); overview.GlobalReleases = await GetActiveGlobalReleasesAsync(federationId, ct); // Sync status overview.SyncStatus = await GetSyncStatusAsync(federation, ct); return overview; } public async Task GetReleaseTimelineAsync( Guid globalReleaseId, CancellationToken ct) { var globalRelease = await _globalReleaseStore.GetAsync(globalReleaseId, ct); var timeline = new GlobalReleaseTimeline { GlobalReleaseId = globalReleaseId, GlobalStatus = globalRelease.Status }; foreach (var (regionId, regionalRelease) in globalRelease.RegionalReleases) { var events = await GetRegionalEventsAsync(regionalRelease, ct); timeline.RegionalTimelines[regionId] = new RegionalTimeline { RegionId = regionId, Status = regionalRelease.Status, Events = events }; } return timeline; } } public sealed record GlobalOverview { public Guid FederationId { get; init; } public DateTimeOffset GeneratedAt { get; init; } // Regions public ImmutableArray RegionOverviews { get; init; } public int HealthyRegions { get; init; } // Aggregates public int TotalDeployments { get; init; } public int TotalAgents { get; init; } public int TotalEnvironments { get; init; } // Releases public ImmutableArray GlobalReleases { get; init; } // Sync public FederationSyncStatus SyncStatus { get; init; } } ``` --- ## Data Models ### Region Configuration ```csharp public sealed record RegionConfig { public Guid Id { get; init; } public string Name { get; init; } public string Code { get; init; } // e.g., "us-east-1", "eu-west-1" public RegionType Type { get; init; } public GeoLocation Location { get; init; } public string Timezone { get; init; } // Connectivity public string ApiEndpoint { get; init; } public string GrpcEndpoint { get; init; } // Configuration public bool IsPrimary { get; init; } public bool IsCanary { get; init; } public int Priority { get; init; } // Data residency public string Jurisdiction { get; init; } // e.g., "EU", "US", "APAC" public ImmutableArray ComplianceFrameworks { get; init; } } public enum RegionType { Primary, Secondary, DisasterRecovery, EdgeLocation } ``` --- ## API Design ### REST Endpoints ``` # Federations POST /api/v1/federations # Create federation GET /api/v1/federations # List federations GET /api/v1/federations/{id} # Get federation GET /api/v1/federations/{id}/status # Get federation status POST /api/v1/federations/{id}/sync # Trigger sync # Regions POST /api/v1/federations/{id}/regions # Add region DELETE /api/v1/federations/{fedId}/regions/{regId} # Remove region GET /api/v1/federations/{id}/regions # List regions GET /api/v1/regions/{id}/status # Get region status # Global Releases POST /api/v1/global-releases # Create global release GET /api/v1/global-releases # List global releases GET /api/v1/global-releases/{id} # Get global release POST /api/v1/global-releases/{id}/promote # Start global promotion GET /api/v1/global-releases/{id}/timeline # Get timeline # Dashboard GET /api/v1/federations/{id}/overview # Global overview GET /api/v1/federations/{id}/metrics # Global metrics GET /api/v1/federations/{id}/map # Geographic view ``` --- ## Metrics & Observability ### Prometheus Metrics ``` # Federation Health stella_federation_regions_total{federation_id, status} stella_federation_sync_lag_seconds{federation_id, source, target} stella_federation_conflicts_total{federation_id, resolution} # Cross-Region stella_cross_region_latency_seconds{source, target} stella_cross_region_requests_total{source, target, status} stella_cross_region_bandwidth_bytes{source, target} # Global Releases stella_global_release_regions_total{release_id, status} stella_global_release_duration_seconds{release_id} stella_global_promotion_wave_duration_seconds{release_id, wave} # Evidence Replication stella_evidence_replication_total{source, target, type} stella_evidence_replication_lag_seconds{source, target} ``` --- ## Configuration Example ```yaml federation: name: "global-production" regions: - id: "us-east-1" name: "US East" type: primary api_endpoint: "https://us-east.stella.example.com" location: latitude: 39.0438 longitude: -77.4874 timezone: "America/New_York" jurisdiction: "US" is_primary: true - id: "eu-west-1" name: "EU West" type: secondary api_endpoint: "https://eu-west.stella.example.com" location: latitude: 53.3498 longitude: -6.2603 timezone: "Europe/Dublin" jurisdiction: "EU" compliance_frameworks: ["GDPR"] - id: "ap-southeast-1" name: "Asia Pacific" type: secondary api_endpoint: "https://apac.stella.example.com" location: latitude: 1.3521 longitude: 103.8198 timezone: "Asia/Singapore" jurisdiction: "APAC" is_canary: true sync_policy: mode: event_driven sync_interval: "00:05:00" max_batch_size: 1000 sync_evidence: true sync_audit_logs: true conflict_strategy: last_write_wins data_residency: rules: - data_type: "evidence.pii" allowed_regions: ["eu-west-1"] allow_redacted: true - data_type: "audit.logs" allowed_regions: ["*"] rollout_strategy: type: canary canary_stabilization_period: "01:00:00" stabilization_period: "00:30:00" stop_on_failure: true ``` --- ## Test Strategy ### Unit Tests - Promotion order calculation - Conflict resolution - Latency scoring - Data residency checks ### Integration Tests - Cross-region sync - Evidence replication - Global promotion flow - Dashboard aggregation ### Chaos Tests - Region unavailability - Network partitions - Split-brain scenarios - Sync conflicts --- ## Migration Path ### Phase 1: Foundation (Week 1-2) - Federation data model - Region registry - Basic connectivity ### Phase 2: Sync (Week 3-4) - Cross-region sync - Conflict resolution - Event propagation ### Phase 3: Global Releases (Week 5-6) - Global release model - Promotion coordinator - Wave management ### Phase 4: Evidence (Week 7-8) - Evidence replication - Data residency - Redaction rules ### Phase 5: Routing (Week 9-10) - Latency router - Region selection - Load balancing ### Phase 6: Dashboard (Week 11-12) - Global overview - Regional timelines - Geo visualization