No file to print Fine. Identity and tenancy: the part everyone underestimates until they trip over it in prod. Here’s the clean, doc‑ready version. > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. --- # Epic 14: Authority‑Backed Scopes & Tenancy **Short name:** `Authority‑Backed Scopes & Tenancy` **Primary components:** Authority (authN/Z), Web Services API, Policy Engine, Orchestrator, Task Runner, Console, CLI **Surfaces:** `/auth/*`, request middleware, DB schema (RLS), object storage layout, message bus topics, audit logs, CLI login/impersonation flows **Touches:** Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant **AOC ground rule reminder:** Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Enforcement of aggregation‑only behavior is tenant‑agnostic and must hold across all scopes. --- ## 1) What it is A uniform model for identity, authorization, and isolation that is enforced end‑to‑end: * **Authority‑backed tokens:** JWT/OIDC tokens issued by a configured Authority. Tokens carry **scopes**, **roles**, and **tenant memberships** as claims. Services verify and authorize locally; no out‑of‑band ACL calls during the hot path. * **Tenancy:** First‑class multi‑tenant isolation with optional **projects** within a tenant. Strong separation at the database layer via **row‑level security (RLS)** and in object storage via **tenant‑prefixed paths** (and optionally per‑tenant KMS keys). * **Scopes & roles:** Minimal set of composable scopes (`stella:{resource}:{verb}`) that map to roles (`viewer`, `editor`, `operator`, `admin`, `owner`) and can be constrained to `{tenant}/{project}`. * **Context propagation:** Every API request, job, message, and artifact is stamped with `{tenant_id, project_id, actor}` and validated at ingress and again at persistence. * **Service accounts & delegation:** Robot identities with scoped, expiring credentials for CI, Task Packs, and webhooks. Human-to-robot delegation is explicit and auditable. * **Audit:** Immutable decision logs for authN/Z events with resource, scope, and policy evaluation outcomes. **Tenancy model:** ``` Organization (optional, for billing) └── Tenant (isolation boundary) ├── Projects (isolation + grouping) │ ├── Sources (registries, repos) │ ├── Jobs & Runs │ ├── SBOMs & Artifacts │ ├── Findings / Evaluations │ └── Policies (bound or inherited) └── Shared tenant services (notifications, exports, secrets) ``` **Knowledge planes:** * **Global knowledge plane:** Advisories, CVE metadata, CPE, KEV, etc. No tenant data. * **Tenant plane:** SBOMs, VEX attachments, policy results, exposures, notifications, exports, audits. Conseiller/Excitator live across both planes: they collect into the global plane and link to tenant plane without merging sources. --- ## 2) Why (brief) Security that depends on “being careful” is not security. We need hard boundaries the platform cannot cross by accident: * Run many teams/customers safely on one deployment. * Minimize blast radius for credentials and mistakes. * Make CI and automation safe with least‑privileged scopes. * Keep latency low by verifying locally with signed tokens. --- ## 3) How it should work (maximum detail) ### 3.1 Tokens, claims, and scopes **Token type:** JWT, signed by the Authority. Services cache JWKS and verify locally. **Required claims:** * `iss`, `sub`, `aud`, `iat`, `exp` * `scope`: space‑separated scopes (`stella:sbom:read`, `stella:job:run`) * `tenants`: array of tenant IDs the subject may access * `tenant` (active): the currently selected tenant for the request * `roles`: object map `{ "": ["viewer", "editor", ...] }` * `projects` (optional): array of project IDs or `{ "": ["projA", "projB"] }` * `mfa` (optional): boolean or level for step‑up enforcement * `act` (optional): actor chain for delegation/impersonation **Scope grammar:** ``` stella:{resource}:{verb}[#{constraint}] resource ∈ {tenant, project, source, job, sbom, vex, advisory, policy, finding, export, notify, secret, pack, ledger, console} verb ∈ {read, list, write, delete, run, execute, approve, admin} constraint := tenant/{tenantId}[/project/{projectId}] ``` Examples: * `stella:sbom:read#tenant/t-123/project/p-abc` * `stella:job:run#tenant/t-123` * `stella:tenant:admin#tenant/t-123` (Tenant Owner) **Role mapping (default):** * `viewer` → read/list on most resources * `editor` → viewer + write on sbom/policy * `operator` → editor + job:run, export:run, notify:manage * `admin` → operator + user/role management inside tenant * `owner` → admin + billing/tenant lifecycle ### 3.2 Selecting the active tenant * **API:** `X‑Stella‑Tenant: ` header or `?tenant=` query. If omitted and the token has exactly one tenant, that tenant is active; else 400. * **Console:** Tenant switcher in the top bar. Console includes header on all calls. * **CLI:** `stella login --tenant ` sets the default; override per command `--tenant`. All services must reject requests where the active tenant is not in `tenants[]` and scopes do not include that tenant constraint. ### 3.3 Request pipeline 1. **Authentication middleware:** verify JWT signature and expiry. 2. **Tenant activation:** pick active tenant per header; set per‑request context `{tenant_id, actor}`. 3. **Scope check:** compare required scope for the route with token scopes. If route accepts project limiters, check constraints align. 4. **Policy overlay (optional):** ABAC evaluation for fine controls (e.g., “deny job:run outside business hours”). 5. **Persistence guard:** set DB session GUC `stella.tenant_id` and verify any writes include matching `tenant_id`. Enforce Postgres RLS. 6. **Audit:** write decision to audit bus (async) with `permit|deny`, reasons, and matched rule. ### 3.4 Database isolation **Approach:** shared schema with **Row Level Security**. Every tenant‑scoped table includes `tenant_id` and optional `project_id`. **RLS policy template (Postgres):** ```sql ALTER TABLE sboms ENABLE ROW LEVEL SECURITY; CREATE POLICY sboms_isolate ON sboms USING (tenant_id = current_setting('stella.tenant_id', true)); -- For INSERT/UPDATE guard: CREATE POLICY sboms_write_guard ON sboms AS PERMISSIVE FOR ALL TO PUBLIC WITH CHECK (tenant_id = current_setting('stella.tenant_id', true)); ``` Set `stella.tenant_id` at connection checkout: ```sql SELECT set_config('stella.tenant_id', $1, true); -- $1 = active tenant ``` **Migrations:** * Add `tenant_id` to all tenant‑scoped tables. * Backfill existing rows with the default tenant in Quickstart. * Enable RLS and policies in a reversible migration. ### 3.5 Object storage and artifacts * **Layout:** `s3:///tenants//projects///...` * **KMS keys:** optional per‑tenant key alias. Map via `kms_alias = "stella-"`. * Ensure Task Runner and Export Center only operate within the prefixed path of the active tenant. ### 3.6 Message bus topics * Topic naming: `stella...` for tenant‑scoped events. * Global knowledge events remain `stella.global.kb.*`. * Subscriptions always include a tenant filter unless consuming global knowledge. ### 3.7 Background workers * **Orchestrator & Task Runner:** each job carries `{tenant_id, project_id}`. Workers set `stella.tenant_id` before any DB or object store access. Reject jobs that miss the context. * **Conseiller/Excitator:** ingest to global plane; linking jobs (matching advisories to tenant SBOMs) run per tenant and respect RLS. ### 3.8 Policy overlay (optional but recommended) Integrate Policy Engine with condition keys: * `tenant`, `project`, `resource.type`, `resource.id`, `actor.role`, `actor.mfa`, `time`, `ip`. Examples: * Deny `job:run` from IPs outside CIDR. * Require `mfa=true` to approve notifications templates. * Quotas: “max exports per hour per tenant.” ### 3.9 Service accounts & delegation * **Robot principals:** `sa:{tenant}:{name}` with scopes constrained to tenant/project. Default TTL 1h; max TTL policy‑controlled. * **Token minting:** Tenant admins can generate tokens via API/Console; all tokens auditable; optional bound to CIDR or workload identity. * **Delegation:** `stella token delegate --to sa:... --scopes ... --ttl 15m` produces a token with `act` chain, recorded in audit log. ### 3.10 Auditing Every decision logs: * `ts, tenant, actor, route, resource, action, effect, reason, scopes_used, policy_rule_id` * Persist in tenant‑scoped audit table and stream to `stella..audit.decisions`. * Expose search/filter in Console → Admin → Audit. ### 3.11 CLI and Console UX * **CLI:** `stella login`, `stella whoami`, `stella tenants list`, `--tenant` flag everywhere. Clear error if token lacks tenant or scope. * **Console:** Tenant switcher, role badges, “why denied?” modal showing scope and policy reasons without leaking internals. * **Impersonation (admin only):** `sudo as ` for debugging with visible banner; issues delegated token with `act` chain. ### 3.12 Compatibility modes * **Quickstart single‑tenant:** hidden tenant `local`. Header optional. RLS enabled with constant. * **Multi‑tenant:** full model active; migrations buttoned up; Console exposes tenant admin. --- ## 4) Architecture ### 4.1 New/updated modules * `auth/authority`: JWKS fetching, token validation, scope parser, cache. * `auth/middleware`: HTTP/gRPC interceptors for authN/Z, tenant activation, audit emit. * `auth/roles`: role → scope mapping + tenant/project constraints. * `auth/policy-bridge`: optional ABAC evaluation using Policy Engine. * `storage/tenantctx`: helpers to set `stella.tenant_id` in DB session and object‑store prefixes. * `audit/decisions`: structured logging and bus producer. * `cli/auth`: login, token store, tenant switcher, whoami. ### 4.2 Data model changes * Add `tenant_id` (and `project_id` where appropriate) to: `sources, jobs, runs, sboms, components, findings, policies, exports, notifications, secrets, packs, ledger, audits`. * Create tables: * `tenants(id, name, status, created_at, owner_user_id)` * `projects(id, tenant_id, name, meta, created_at)` * `memberships(user_id, tenant_id, roles[])` * `service_accounts(id, tenant_id, name, scopes[], created_at, disabled)` * `audit_decisions(...)` (tenant‑scoped) --- ## 5) APIs and contracts ### 5.1 Standard headers * `Authorization: Bearer ` * `X‑Stella‑Tenant: ` * `X‑Request‑ID` (propagated for audit correlation) ### 5.2 Auth endpoints * `POST /auth/login` (OIDC code flow start for Console) * `GET /auth/jwks.json` (proxy/cached from Authority if needed) * `GET /auth/whoami` → `{ sub, tenants[], activeTenant, roles, scopes, mfa }` * `POST /auth/tokens/service` (tenant admin) → mint robot token with constrained scopes/ttl * `POST /auth/tokens/delegate` → mint delegated token with `act` chain ### 5.3 Tenant admin endpoints * `POST /tenants` (owner only) * `GET /tenants`, `GET /tenants/:id` * `POST /tenants/:id/projects`, `GET /tenants/:id/projects` * `POST /tenants/:id/members` (assign role), `DELETE /tenants/:id/members/:user` * `GET /tenants/:id/audit` (search) ### 5.4 Route protection conventions Each route declares: * `resource`, `verb` * Whether it requires project constraint * Optional policy gates (e.g., `require_mfa`) Example (OpenAPI extension): ```yaml x-stella-auth: resource: sbom verb: read requireTenant: true allowProjectScoped: true requireMFA: false ``` --- ## 6) Documentation changes Create/update: 1. `/docs/security/tenancy-overview.md` Concepts, knowledge planes, tenant/project model, isolation layers. 2. `/docs/security/scopes-and-roles.md` Scope grammar, default roles, examples, custom role mapping. 3. `/docs/security/authority-config.md` How to connect to an OIDC provider, JWKS caching, audience, issuers, time skew, MFA signal. 4. `/docs/operations/multi-tenancy.md` Running multi‑tenant deployments: quotas, KMS per tenant, object store layout, message topics, backup/restore per tenant. 5. `/docs/operations/rls-and-data-isolation.md` Postgres RLS policy reference, migrations, troubleshooting leaks. 6. `/docs/console/admin-tenants.md` Tenant switcher, managing members, roles, audit viewer. 7. `/docs/cli/authentication.md` `login`, `whoami`, `tenants list`, `--tenant`, service tokens, delegation. 8. `/docs/api/authentication.md` Headers, error codes, sample requests, OpenAPI `x-stella-auth` annotations. 9. `/docs/policy/examples/abac-overlays.md` Optional policy snippets: MFA requirements, time windows, IP restrictions, quotas. 10. `/docs/install/configuration-reference.md` New `STELLA_AUTH_*`, `STELLA_TENANCY_*`, and per‑service flags. Add at the top of each page: > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. --- ## 7) Implementation plan ### Middleware & libraries * Implement `auth/middleware` with: * JWKS cache and signature verification (kid pinning, 10 min refresh). * Scope parser and matcher with constraint evaluation. * Tenant activator reading header/query and verifying membership. * Policy hook for ABAC (feature flag). * Decision audit emitter (non‑blocking). ### Services * **API:** wrap all handlers with middleware; declare route protection via decorators/annotations; enforce project constraints. * **DB access layer:** on connection checkout set `stella.tenant_id`; forbid raw SQL that bypasses the session GUC. * **Orchestrator/Task Runner:** include `{tenant_id, project_id}` in job spec; enforce before any IO. * **Export/Notify/AI:** stamp tenant in outbound payloads and logs; include it in idempotency keys. * **Conseiller/Excitator:** keep global ingestion; ensure linking jobs run with tenant context only. ### Console * Add tenant switcher and “whoami” panel. * Show role badges; display “why denied?” with scope/policy explanation from 403 payload. * Tenant Admin screens: members, roles, service tokens, audit. ### CLI * `stella login` (device/code or local browser) and tenant selection. * Persist tokens per profile; `--tenant` override; `whoami`. * Commands fail with clear errors on scope violation. ### Storage * Prefix object store paths by tenant/project. * Optional per‑tenant KMS key integration. ### Migrations * Add `tenant_id` and backfill. * Enable RLS with policies and tests. * Seed default tenant for Quickstart. --- ## 8) Engineering tasks **Auth & middleware** * [ ] Implement JWKS retrieval and caching with rotation tests. * [ ] Implement scope parser and matcher with constraint support. * [ ] Build HTTP/gRPC interceptors and integrate across services. * [ ] Add ABAC policy hook and sample policies. * [ ] Emit structured decision audits. **Data & storage** * [ ] Add `tenant_id` columns and indices; backfill in migration. * [ ] Enable Postgres RLS policies for all tenant‑scoped tables. * [ ] Update ORM/queries to rely on RLS; remove any “WHERE tenant_id = ...” duplication. * [ ] Tenant‑prefixed object store paths; optional per‑tenant KMS keys. **Services** * [ ] Annotate all routes with `x-stella-auth` or equivalent decorator. * [ ] Propagate tenant context through orchestrator and workers. * [ ] Update Conseiller/Excitator linkers to require tenant context. **Console** * [ ] Implement tenant switcher, role display, and “whoami.” * [ ] Add Tenant Admin screens (members, projects, service accounts). * [ ] Implement “Why denied?” modal reading 403 details. **CLI** * [ ] `login`, `whoami`, `tenants list`, tenant flag and persistence. * [ ] Service token minting for tenant admins. * [ ] Delegate token creation for robot use. **Audit** * [ ] Create audit_decisions table and producer. * [ ] Add search API and Console viewer. **Docs** * [ ] Author the ten docs listed in §6 with examples and diagrams. * [ ] Add troubleshooting: common 401/403 causes and fixes. * [ ] Add migration guide from single‑tenant to multi‑tenant. **Tests** * [ ] Unit tests for scope matching and token validation. * [ ] RLS tests verifying cross‑tenant reads/writes fail. * [ ] E2E tests: multi‑tenant users, robot tokens, ABAC overlays, orchestrator runs. * [ ] Fuzz tests on header handling to prevent tenant confusion bugs. --- ## 9) Feature changes required * **All services:** must expose 403 payloads with machine‑and human‑readable denial reasons and the missing scope string. * **Export Center:** requires tenant in all manifests; deny cross‑tenant exports by default; allow explicit cross‑tenant mirror via signed bundle. * **Notifications:** destinations and templates are tenant‑scoped; sending pipeline stamps tenant. * **Advisory AI Assistant:** restricts training context to a tenant’s data; global knowledge plane may be referenced but never log tenant data into global. * **Findings Ledger:** partition by tenant; queries must be tenant‑filtered or rely solely on RLS. * **Policy Engine:** support condition keys for tenant and actor; ship example policies. --- ## 10) Acceptance criteria * Requests lacking `X‑Stella‑Tenant` in multi‑tenant mode are rejected unless single‑tenant Quickstart. * RLS prevents cross‑tenant leakage proven by tests that attempt blind selects/inserts. * CLI can log in, list tenants, select tenant, and perform a job limited to that tenant. * Console shows tenant switcher; admin can invite a member and assign roles. * Service token can be minted with narrow scopes and expires as configured. * Every 403 returns a clear “missing required scope …” with the exact scope string. * Conseiller/Excitator continue aggregation‑only behavior; linking jobs run strictly under tenant context. * Audit stream captures all permit/deny decisions with correlation IDs. --- ## 11) Risks & mitigations * **RLS misconfiguration.** Write tests that run with and without RLS; block migrations unless policies are present. Provide a canary query per service on boot to verify isolation. * **Scope explosion.** Keep a minimal, stable scope set; use constraints for specificity; document patterns. * **JWKS outages.** Cache keys with TTL, support multiple `kid`s, and tolerate short network failures. * **Privilege creep in robots.** Short TTLs by default, clear UI to rotate/revoke, and audit for usage anomalies. * **Tenant confusion bugs.** Require tenant header, validate against token memberships, and pin tenant context into DB session and job payloads, never thread‑locals only. --- ## 12) Philosophy * **Isolation by default.** Tenancy isn’t a UI filter; it’s enforced where the data lives. * **Least privilege wins.** Humans and robots get only what they need for as long as they need it. * **Explain denials.** If the platform can’t explain “why no,” it’s broken. * **Global vs tenant plane.** Public knowledge is shared; customer data is not, ever. > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.