# Multi-Tenant Policy Rollout Flow ## Overview The Multi-Tenant Policy Rollout Flow describes how StellaOps propagates policy changes across multiple tenants in a controlled, auditable manner. This flow supports staged rollouts, canary deployments, and rollback capabilities for enterprise policy governance. **Business Value**: Centralized policy management with controlled rollout reduces risk of policy changes breaking production workflows while ensuring consistent security standards across the organization. ## Actors | Actor | Type | Role | |-------|------|------| | Policy Admin | Human | Creates and approves policy changes | | Platform Admin | Human | Manages cross-tenant rollouts | | Policy Engine | Service | Evaluates and applies policies | | Authority | Service | Manages tenant hierarchy | | Notify | Service | Alerts on rollout status | | Scheduler | Service | Orchestrates staged rollout | ## Prerequisites - Multi-tenant environment configured - Tenant hierarchy defined (org → teams → projects) - Policy inheritance rules established - Rollout approval workflow configured ## Tenant Hierarchy ``` Organization (acme-corp) ├── Team: Platform Engineering │ ├── Project: core-services │ └── Project: infrastructure ├── Team: Product Development │ ├── Project: web-app │ ├── Project: mobile-api │ └── Project: data-pipeline └── Team: Security └── Project: security-tools ``` ## Flow Diagram ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ Multi-Tenant Policy Rollout Flow │ └─────────────────────────────────────────────────────────────────────────────────┘ ┌──────────┐ ┌─────────┐ ┌───────────┐ ┌──────────┐ ┌────────┐ ┌────────┐ │ Policy │ │ Policy │ │ Scheduler │ │ Authority│ │ Policy │ │ Notify │ │ Admin │ │ Store │ │ │ │ │ │ Engine │ │ │ └────┬─────┘ └────┬────┘ └─────┬─────┘ └────┬─────┘ └───┬────┘ └───┬────┘ │ │ │ │ │ │ │ Create │ │ │ │ │ │ policy v2 │ │ │ │ │ │────────────>│ │ │ │ │ │ │ │ │ │ │ │ │ Store as │ │ │ │ │ │ draft │ │ │ │ │ │───┐ │ │ │ │ │ │ │ │ │ │ │ │ │<──┘ │ │ │ │ │ │ │ │ │ │ │ Define │ │ │ │ │ │ rollout │ │ │ │ │ │────────────────────────────> │ │ │ │ │ │ │ │ │ │ │ │ Get tenant │ │ │ │ │ │ hierarchy │ │ │ │ │ │────────────>│ │ │ │ │ │ │ │ │ │ │ │ Tenant tree │ │ │ │ │ │<────────────│ │ │ │ │ │ │ │ │ │ Rollout │ │ │ │ │ │ plan │ │ │ │ │ │<──────────────────────────── │ │ │ │ │ │ │ │ │ │ Approve │ │ │ │ │ │────────────────────────────> │ │ │ │ │ │ │ │ │ │ │ │ Stage 1: │ │ │ │ │ │ Canary │ │ │ │ │ │─────────────────────────>│ │ │ │ │ │ │ │ │ │ │ │ │ Apply to │ │ │ │ │ │ canary │ │ │ │ │ │ tenant │ │ │ │ │ │───┐ │ │ │ │ │ │ │ │ │ │ │ │ │<──┘ │ │ │ │ │ │ │ │ │ │ Monitor │ │ │ │ │ │ (24h) │ │ │ │ │ │───┐ │ │ │ │ │ │ │ │ │ │ │ │ │<──┘ │ │ │ │ │ │ │ │ │ │ │ │ Stage 2: │ │ │ │ │ │ 25% tenants │ │ │ │ │ │─────────────────────────>│ │ │ │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ │ │ │ │ │ │ Stage N: │ │ │ │ │ │ 100% │ │ │ │ │ │─────────────────────────>│ │ │ │ │ │ │ │ │ │ │ Complete │ │ │ │ │ │───────────────────────────────────────> │ │ │ │ │ │ │ Rollout │ │ │ │ │ Notify │ complete │ │ │ │ │ admins │<──────────────────────────────────────────────────────────────────── │ │ │ │ │ │ ``` ## Step-by-Step ### 1. Policy Creation Policy Admin creates new policy version: ```yaml # Policy Set: production-v2 version: "stella-dsl@1" name: production version_tag: "v2.0.0" description: "Updated production policy with KEV blocking" changes_from_v1: - added: block-kev-vulnerabilities - modified: critical-threshold (9.0 → 8.5) - removed: legacy-exception-rule rules: - name: block-kev-vulnerabilities description: Block any KEV-listed vulnerability condition: kev == true action: FAIL severity: critical - name: no-critical-reachable condition: | severity == 'critical' AND cvss >= 8.5 AND reachability IN ['SR', 'RO', 'CR'] action: FAIL ``` ### 2. Rollout Plan Definition Platform Admin defines rollout strategy: ```json { "rollout_id": "rollout-789", "policy_set": "production", "from_version": "v1.0.0", "to_version": "v2.0.0", "strategy": "staged", "stages": [ { "name": "canary", "description": "Single low-risk tenant", "tenants": ["platform-eng-core-services"], "duration": "24h", "success_criteria": { "max_new_failures": 5, "max_failure_rate_increase": 0.05 }, "auto_proceed": false }, { "name": "early-adopters", "description": "25% of tenants (by scan volume)", "selection": { "method": "percentage", "value": 25, "weight_by": "scan_volume" }, "duration": "48h", "success_criteria": { "max_new_failures": 20, "max_failure_rate_increase": 0.10 }, "auto_proceed": true }, { "name": "majority", "description": "75% of tenants", "selection": { "method": "percentage", "value": 75 }, "duration": "24h", "auto_proceed": true }, { "name": "full", "description": "100% of tenants", "selection": { "method": "all" } } ], "rollback": { "automatic": true, "triggers": [ {"metric": "failure_rate_increase", "threshold": 0.20}, {"metric": "new_critical_blocks", "threshold": 50} ] } } ``` ### 3. Impact Analysis Before approval, system analyzes potential impact: ```json { "impact_analysis": { "rollout_id": "rollout-789", "analysis_date": "2024-12-29T10:00:00Z", "historical_data_range": "30d", "results": { "total_scans_analyzed": 15234, "predicted_new_failures": 127, "predicted_failure_rate_change": "+0.83%", "affected_images": 89, "by_team": [ {"team": "Product Development", "new_failures": 78}, {"team": "Platform Engineering", "new_failures": 31}, {"team": "Security", "new_failures": 18} ], "top_triggered_rules": [ {"rule": "block-kev-vulnerabilities", "count": 45}, {"rule": "no-critical-reachable", "count": 82} ], "recommendation": "PROCEED_WITH_CAUTION" } } } ``` ### 4. Approval and Initiation Policy Admin approves rollout after review: ```json { "approval": { "rollout_id": "rollout-789", "approved_by": "policy-admin@acme.com", "approved_at": "2024-12-29T11:00:00Z", "approval_notes": "Impact acceptable. Notified affected teams.", "notifications_sent": [ {"channel": "slack", "target": "#platform-engineering"}, {"channel": "email", "target": "team-leads@acme.com"} ] } } ``` ### 5. Staged Execution Scheduler executes each stage: #### Stage 1: Canary ```json { "stage_execution": { "rollout_id": "rollout-789", "stage": "canary", "started_at": "2024-12-29T11:00:00Z", "tenants_activated": ["platform-eng-core-services"], "status": "monitoring" } } ``` #### Stage Monitoring ```json { "stage_metrics": { "rollout_id": "rollout-789", "stage": "canary", "monitored_period": "24h", "metrics": { "scans_evaluated": 234, "new_failures": 3, "failure_rate_before": 0.12, "failure_rate_after": 0.13, "success_criteria_met": true } } } ``` ### 6. Progressive Rollout After canary success, proceed to next stages: ```json { "stage_progression": { "rollout_id": "rollout-789", "completed_stages": ["canary", "early-adopters", "majority"], "current_stage": "full", "tenants_on_v2": 47, "tenants_on_v1": 0, "total_rollout_duration": "96h", "status": "completed" } } ``` ### 7. Rollback (If Needed) If success criteria not met, automatic rollback: ```json { "rollback": { "rollout_id": "rollout-789", "triggered_at": "2024-12-30T15:30:00Z", "trigger_reason": "failure_rate_increase exceeded 0.20 threshold", "rollback_stage": "early-adopters", "tenants_rolled_back": 12, "action": "reverted to v1.0.0", "notifications_sent": true } } ``` ## Rollout Strategies ### Blue-Green ```yaml strategy: blue_green config: parallel_evaluation: true # Both versions evaluated comparison_period: 24h switch_threshold: verdict_match_rate: 0.95 ``` ### Canary with Traffic Split ```yaml strategy: canary_traffic config: initial_percentage: 5 increment: 10 increment_interval: 4h max_error_rate: 0.01 ``` ### Feature Flag ```yaml strategy: feature_flag config: flag_name: "policy-v2-enabled" default: false overrides: - tenant: "security-team" value: true ``` ## Data Contracts ### Rollout Plan Schema ```typescript interface RolloutPlan { rollout_id: string; policy_set: string; from_version: string; to_version: string; strategy: 'staged' | 'blue_green' | 'canary_traffic' | 'feature_flag'; stages: Stage[]; rollback: { automatic: boolean; triggers: RollbackTrigger[]; }; notifications: NotificationConfig[]; } interface Stage { name: string; description?: string; tenants?: string[]; selection?: TenantSelection; duration?: string; // ISO-8601 duration success_criteria?: SuccessCriteria; auto_proceed?: boolean; } ``` ### Rollout Status Schema ```typescript interface RolloutStatus { rollout_id: string; status: 'pending' | 'in_progress' | 'paused' | 'completed' | 'rolled_back' | 'failed'; current_stage?: string; stages: Array<{ name: string; status: 'pending' | 'active' | 'monitoring' | 'completed' | 'failed'; started_at?: string; completed_at?: string; metrics?: StageMetrics; }>; tenant_status: Array<{ tenant_id: string; policy_version: string; activated_at?: string; }>; } ``` ## Policy Inheritance ``` Organization Policy (base) └── inherits_from: stellaops-default Team Policy (override) └── inherits_from: organization └── overrides: [severity-thresholds] Project Policy (final) └── inherits_from: team └── overrides: [specific-exceptions] ``` Resolution order: Project → Team → Organization → Platform Default ## Error Handling | Error | Recovery | |-------|----------| | Stage timeout | Pause rollout, alert admin | | Metrics unavailable | Use last known good, extend monitoring | | Tenant unreachable | Skip tenant, continue with others | | Rollback failure | Manual intervention required | ## Observability ### Metrics | Metric | Type | Labels | |--------|------|--------| | `policy_rollout_status` | Gauge | `rollout_id`, `stage` | | `policy_rollout_tenant_count` | Gauge | `rollout_id`, `version` | | `policy_rollout_failures_total` | Counter | `rollout_id`, `stage` | | `policy_version_active` | Gauge | `tenant`, `policy_set`, `version` | ### Key Log Events | Event | Level | Fields | |-------|-------|--------| | `rollout.created` | INFO | `rollout_id`, `policy_set`, `stages` | | `rollout.stage.started` | INFO | `rollout_id`, `stage`, `tenants` | | `rollout.stage.completed` | INFO | `rollout_id`, `stage`, `metrics` | | `rollout.rollback` | WARN | `rollout_id`, `reason`, `tenants` | | `rollout.completed` | INFO | `rollout_id`, `duration` | ## Related Flows - [Policy Evaluation Flow](04-policy-evaluation-flow.md) - Policy application - [Exception Approval Workflow](17-exception-approval-workflow.md) - Exception handling - [Notification Flow](05-notification-flow.md) - Rollout alerts