Files
git.stella-ops.org/docs/operations/multi-tenant-rollout-and-compatibility.md

95 lines
4.5 KiB
Markdown

# Multi-Tenant Same-Key Rollout and Compatibility Policy
Date: 2026-02-22
Source sprint: `SPRINT_20260222_053_DOCS_multi_tenant_same_api_key_contract_baseline.md`
Related ADR: `docs/architecture/decisions/ADR-002-multi-tenant-same-api-key-selection.md`
## Rollout Goals
- Deploy one-selected-tenant-per-token behavior with no cross-tenant leakage.
- Preserve bounded backward compatibility for legacy tenant headers during migration.
- Enforce strict mode only after module-level validation is complete.
## Phase Plan
### Phase 0 - Contract and Feature Flags (2026-02-22 to 2026-02-28)
- Deploy order:
- Publish ADR + service impact ledger + flow sequences + acceptance matrix.
- Introduce feature flags required for bounded compatibility paths.
- Exit criteria:
- Contract docs accepted by Authority, Router, Platform, Scanner, Graph, and Web owners.
- Sprint trackers for `053-060` are ready with test evidence plans.
- Rollback criteria:
- Contract ambiguity or unresolved cross-module naming conflicts.
### Phase 1 - Authority + Router/Gateway Compatibility Mode (2026-03-01 to 2026-03-07)
- Deploy order:
- Authority (`20260222.054`) before Router/Gateway (`20260222.055`).
- Enable compatibility aliases while keeping canonical header/claim behavior authoritative.
- Exit criteria:
- Token issuance resolves selected tenant deterministically.
- Router strips spoofed inbound headers and rewrites canonical tenant headers.
- Targeted Authority/Router tests pass in CI.
- Rollback criteria:
- Token issuance regression or tenant mismatch false positives on production traffic.
### Phase 2 - Service Migration: Platform, Scanner, Graph (2026-03-08 to 2026-03-17)
- Deploy order:
- Platform (`20260222.056`)
- Scanner (`20260222.057`)
- Graph (`20260222.058`)
- Exit criteria:
- Tenant resolver paths and endpoint policies are consistent.
- Cross-tenant access attempts are rejected deterministically.
- Module integration tests for tenant isolation pass.
- Rollback criteria:
- Data partition mismatch, endpoint policy regression, or unresolved tenant conflict errors.
### Phase 3 - Web Global Selector + Client Unification (2026-03-18 to 2026-03-24)
- Deploy order:
- Web selector and runtime state unification (`20260222.059`).
- Playwright tenant matrix and QA evidence (`20260222.060`).
- Exit criteria:
- Global tenant selector switches tenant context across primary pages.
- Canonical tenant injection path is active with legacy usage telemetry.
- Tier 2c UI evidence is complete.
- Rollback criteria:
- Tenant switch causes stale data bleed, broken navigation, or unrecoverable error states.
### Phase 4 - Strict Mode + Legacy Removal (earliest start 2026-04-01)
- Deploy order:
- Disable legacy tenant header acceptance for new clients first.
- Remove compatibility aliases from runtime clients and gateway compatibility branches.
- Exit criteria:
- Legacy header usage telemetry remains at zero for 14 consecutive days.
- No dependency remains on scalar-only or header-only tenant override paths.
- Rollback criteria:
- Any tenant selection outage or blocked tenant-scoped production path.
## Compatibility Window
- Legacy header aliases (`X-Stella-Tenant`, `X-Tenant-Id`) are supported in compatibility mode through **2026-03-31**.
- Starting **2026-04-01**, strict mode can be enabled once zero legacy usage is confirmed for 14 days.
- Canonical header is `X-StellaOps-Tenant` during and after migration.
## Observability Checkpoints
- Gateway telemetry for stripped/spoofed tenant headers.
- Legacy header usage counters from Web tenant interceptor telemetry.
- Authority token issuance/validation audit events for tenant mismatch/ambiguity.
- Platform/Scanner/Graph tenant conflict and forbidden response rates.
## Rollback Playbook
- If regressions appear:
- Re-enable compatibility feature flags.
- Freeze strict-mode rollout.
- Revert last deployment batch for the affected module only.
- Re-run tenant isolation acceptance matrix against last-known-good build.
- Rollback does not change ADR model (one selected tenant per token); it only restores bounded compatibility behavior.
## Production Readiness Checklist
- [ ] Authority token issuance + validation tests green with selected tenant model.
- [ ] Router/Gateway spoof/mismatch protections verified in staging.
- [ ] Platform, Scanner, Graph tenant isolation tests green.
- [ ] Web selector tests and Playwright tenant matrix green (desktop + mobile).
- [ ] Legacy header telemetry reviewed with dated cutoff evidence.
- [ ] Go/no-go decision documented by QA and Project Manager.