458 lines
38 KiB
Markdown
458 lines
38 KiB
Markdown
No file to print
|
||
Fine. Identity and tenancy: the part everyone underestimates until they trip over it in prod. Here’s the clean, doc‑ready version.
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
---
|
||
|
||
# Epic 14: Authority‑Backed Scopes & Tenancy
|
||
|
||
**Short name:** `Authority‑Backed Scopes & Tenancy`
|
||
**Primary components:** Authority (authN/Z), Web Services API, Policy Engine, Orchestrator, Task Runner, Console, CLI
|
||
**Surfaces:** `/auth/*`, request middleware, DB schema (RLS), object storage layout, message bus topics, audit logs, CLI login/impersonation flows
|
||
**Touches:** Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant
|
||
|
||
**AOC ground rule reminder:** Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Enforcement of aggregation‑only behavior is tenant‑agnostic and must hold across all scopes.
|
||
|
||
---
|
||
|
||
## 1) What it is
|
||
|
||
A uniform model for identity, authorization, and isolation that is enforced end‑to‑end:
|
||
|
||
* **Authority‑backed tokens:** JWT/OIDC tokens issued by a configured Authority. Tokens carry **scopes**, **roles**, and **tenant memberships** as claims. Services verify and authorize locally; no out‑of‑band ACL calls during the hot path.
|
||
* **Tenancy:** First‑class multi‑tenant isolation with optional **projects** within a tenant. Strong separation at the database layer via **row‑level security (RLS)** and in object storage via **tenant‑prefixed paths** (and optionally per‑tenant KMS keys).
|
||
* **Scopes & roles:** Minimal set of composable scopes (`stella:{resource}:{verb}`) that map to roles (`viewer`, `editor`, `operator`, `admin`, `owner`) and can be constrained to `{tenant}/{project}`.
|
||
* **Context propagation:** Every API request, job, message, and artifact is stamped with `{tenant_id, project_id, actor}` and validated at ingress and again at persistence.
|
||
* **Service accounts & delegation:** Robot identities with scoped, expiring credentials for CI, Task Packs, and webhooks. Human-to-robot delegation is explicit and auditable.
|
||
* **Audit:** Immutable decision logs for authN/Z events with resource, scope, and policy evaluation outcomes.
|
||
|
||
**Tenancy model:**
|
||
|
||
```
|
||
Organization (optional, for billing)
|
||
└── Tenant (isolation boundary)
|
||
├── Projects (isolation + grouping)
|
||
│ ├── Sources (registries, repos)
|
||
│ ├── Jobs & Runs
|
||
│ ├── SBOMs & Artifacts
|
||
│ ├── Findings / Evaluations
|
||
│ └── Policies (bound or inherited)
|
||
└── Shared tenant services (notifications, exports, secrets)
|
||
```
|
||
|
||
**Knowledge planes:**
|
||
|
||
* **Global knowledge plane:** Advisories, CVE metadata, CPE, KEV, etc. No tenant data.
|
||
* **Tenant plane:** SBOMs, VEX attachments, policy results, exposures, notifications, exports, audits.
|
||
|
||
Conseiller/Excitator live across both planes: they collect into the global plane and link to tenant plane without merging sources.
|
||
|
||
---
|
||
|
||
## 2) Why (brief)
|
||
|
||
Security that depends on “being careful” is not security. We need hard boundaries the platform cannot cross by accident:
|
||
|
||
* Run many teams/customers safely on one deployment.
|
||
* Minimize blast radius for credentials and mistakes.
|
||
* Make CI and automation safe with least‑privileged scopes.
|
||
* Keep latency low by verifying locally with signed tokens.
|
||
|
||
---
|
||
|
||
## 3) How it should work (maximum detail)
|
||
|
||
### 3.1 Tokens, claims, and scopes
|
||
|
||
**Token type:** JWT, signed by the Authority. Services cache JWKS and verify locally.
|
||
|
||
**Required claims:**
|
||
|
||
* `iss`, `sub`, `aud`, `iat`, `exp`
|
||
* `scope`: space‑separated scopes (`stella:sbom:read`, `stella:job:run`)
|
||
* `tenants`: array of tenant IDs the subject may access
|
||
* `tenant` (active): the currently selected tenant for the request
|
||
* `roles`: object map `{ "<tenant>": ["viewer", "editor", ...] }`
|
||
* `projects` (optional): array of project IDs or `{ "<tenant>": ["projA", "projB"] }`
|
||
* `mfa` (optional): boolean or level for step‑up enforcement
|
||
* `act` (optional): actor chain for delegation/impersonation
|
||
|
||
**Scope grammar:**
|
||
|
||
```
|
||
stella:{resource}:{verb}[#{constraint}]
|
||
resource ∈ {tenant, project, source, job, sbom, vex, advisory, policy, finding, export, notify, secret, pack, ledger, console}
|
||
verb ∈ {read, list, write, delete, run, execute, approve, admin}
|
||
constraint := tenant/{tenantId}[/project/{projectId}]
|
||
```
|
||
|
||
Examples:
|
||
|
||
* `stella:sbom:read#tenant/t-123/project/p-abc`
|
||
* `stella:job:run#tenant/t-123`
|
||
* `stella:tenant:admin#tenant/t-123` (Tenant Owner)
|
||
|
||
**Role mapping (default):**
|
||
|
||
* `viewer` → read/list on most resources
|
||
* `editor` → viewer + write on sbom/policy
|
||
* `operator` → editor + job:run, export:run, notify:manage
|
||
* `admin` → operator + user/role management inside tenant
|
||
* `owner` → admin + billing/tenant lifecycle
|
||
|
||
### 3.2 Selecting the active tenant
|
||
|
||
* **API:** `X‑Stella‑Tenant: <tenant_id>` header or `?tenant=<id>` query. If omitted and the token has exactly one tenant, that tenant is active; else 400.
|
||
* **Console:** Tenant switcher in the top bar. Console includes header on all calls.
|
||
* **CLI:** `stella login --tenant <id>` sets the default; override per command `--tenant`.
|
||
|
||
All services must reject requests where the active tenant is not in `tenants[]` and scopes do not include that tenant constraint.
|
||
|
||
### 3.3 Request pipeline
|
||
|
||
1. **Authentication middleware:** verify JWT signature and expiry.
|
||
2. **Tenant activation:** pick active tenant per header; set per‑request context `{tenant_id, actor}`.
|
||
3. **Scope check:** compare required scope for the route with token scopes. If route accepts project limiters, check constraints align.
|
||
4. **Policy overlay (optional):** ABAC evaluation for fine controls (e.g., “deny job:run outside business hours”).
|
||
5. **Persistence guard:** set DB session GUC `stella.tenant_id` and verify any writes include matching `tenant_id`. Enforce Postgres RLS.
|
||
6. **Audit:** write decision to audit bus (async) with `permit|deny`, reasons, and matched rule.
|
||
|
||
### 3.4 Database isolation
|
||
|
||
**Approach:** shared schema with **Row Level Security**. Every tenant‑scoped table includes `tenant_id` and optional `project_id`.
|
||
|
||
**RLS policy template (Postgres):**
|
||
|
||
```sql
|
||
ALTER TABLE sboms ENABLE ROW LEVEL SECURITY;
|
||
|
||
CREATE POLICY sboms_isolate ON sboms
|
||
USING (tenant_id = current_setting('stella.tenant_id', true));
|
||
|
||
-- For INSERT/UPDATE guard:
|
||
CREATE POLICY sboms_write_guard ON sboms
|
||
AS PERMISSIVE FOR ALL
|
||
TO PUBLIC
|
||
WITH CHECK (tenant_id = current_setting('stella.tenant_id', true));
|
||
```
|
||
|
||
Set `stella.tenant_id` at connection checkout:
|
||
|
||
```sql
|
||
SELECT set_config('stella.tenant_id', $1, true); -- $1 = active tenant
|
||
```
|
||
|
||
**Migrations:**
|
||
|
||
* Add `tenant_id` to all tenant‑scoped tables.
|
||
* Backfill existing rows with the default tenant in Quickstart.
|
||
* Enable RLS and policies in a reversible migration.
|
||
|
||
### 3.5 Object storage and artifacts
|
||
|
||
* **Layout:** `s3://<bucket>/tenants/<tenant_id>/projects/<project_id>/<resource>/<uuid>...`
|
||
* **KMS keys:** optional per‑tenant key alias. Map via `kms_alias = "stella-<tenant_id>"`.
|
||
* Ensure Task Runner and Export Center only operate within the prefixed path of the active tenant.
|
||
|
||
### 3.6 Message bus topics
|
||
|
||
* Topic naming: `stella.<tenant_id>.<domain>.<event>` for tenant‑scoped events.
|
||
* Global knowledge events remain `stella.global.kb.*`.
|
||
* Subscriptions always include a tenant filter unless consuming global knowledge.
|
||
|
||
### 3.7 Background workers
|
||
|
||
* **Orchestrator & Task Runner:** each job carries `{tenant_id, project_id}`. Workers set `stella.tenant_id` before any DB or object store access. Reject jobs that miss the context.
|
||
* **Conseiller/Excitator:** ingest to global plane; linking jobs (matching advisories to tenant SBOMs) run per tenant and respect RLS.
|
||
|
||
### 3.8 Policy overlay (optional but recommended)
|
||
|
||
Integrate Policy Engine with condition keys:
|
||
|
||
* `tenant`, `project`, `resource.type`, `resource.id`, `actor.role`, `actor.mfa`, `time`, `ip`.
|
||
Examples:
|
||
* Deny `job:run` from IPs outside CIDR.
|
||
* Require `mfa=true` to approve notifications templates.
|
||
* Quotas: “max exports per hour per tenant.”
|
||
|
||
### 3.9 Service accounts & delegation
|
||
|
||
* **Robot principals:** `sa:{tenant}:{name}` with scopes constrained to tenant/project. Default TTL 1h; max TTL policy‑controlled.
|
||
* **Token minting:** Tenant admins can generate tokens via API/Console; all tokens auditable; optional bound to CIDR or workload identity.
|
||
* **Delegation:** `stella token delegate --to sa:... --scopes ... --ttl 15m` produces a token with `act` chain, recorded in audit log.
|
||
|
||
### 3.10 Auditing
|
||
|
||
Every decision logs:
|
||
|
||
* `ts, tenant, actor, route, resource, action, effect, reason, scopes_used, policy_rule_id`
|
||
* Persist in tenant‑scoped audit table and stream to `stella.<tenant>.audit.decisions`.
|
||
* Expose search/filter in Console → Admin → Audit.
|
||
|
||
### 3.11 CLI and Console UX
|
||
|
||
* **CLI:** `stella login`, `stella whoami`, `stella tenants list`, `--tenant` flag everywhere. Clear error if token lacks tenant or scope.
|
||
* **Console:** Tenant switcher, role badges, “why denied?” modal showing scope and policy reasons without leaking internals.
|
||
* **Impersonation (admin only):** `sudo as <user>` for debugging with visible banner; issues delegated token with `act` chain.
|
||
|
||
### 3.12 Compatibility modes
|
||
|
||
* **Quickstart single‑tenant:** hidden tenant `local`. Header optional. RLS enabled with constant.
|
||
* **Multi‑tenant:** full model active; migrations buttoned up; Console exposes tenant admin.
|
||
|
||
---
|
||
|
||
## 4) Architecture
|
||
|
||
### 4.1 New/updated modules
|
||
|
||
* `auth/authority`: JWKS fetching, token validation, scope parser, cache.
|
||
* `auth/middleware`: HTTP/gRPC interceptors for authN/Z, tenant activation, audit emit.
|
||
* `auth/roles`: role → scope mapping + tenant/project constraints.
|
||
* `auth/policy-bridge`: optional ABAC evaluation using Policy Engine.
|
||
* `storage/tenantctx`: helpers to set `stella.tenant_id` in DB session and object‑store prefixes.
|
||
* `audit/decisions`: structured logging and bus producer.
|
||
* `cli/auth`: login, token store, tenant switcher, whoami.
|
||
|
||
### 4.2 Data model changes
|
||
|
||
* Add `tenant_id` (and `project_id` where appropriate) to: `sources, jobs, runs, sboms, components, findings, policies, exports, notifications, secrets, packs, ledger, audits`.
|
||
* Create tables:
|
||
|
||
* `tenants(id, name, status, created_at, owner_user_id)`
|
||
* `projects(id, tenant_id, name, meta, created_at)`
|
||
* `memberships(user_id, tenant_id, roles[])`
|
||
* `service_accounts(id, tenant_id, name, scopes[], created_at, disabled)`
|
||
* `audit_decisions(...)` (tenant‑scoped)
|
||
|
||
---
|
||
|
||
## 5) APIs and contracts
|
||
|
||
### 5.1 Standard headers
|
||
|
||
* `Authorization: Bearer <jwt>`
|
||
* `X‑Stella‑Tenant: <tenant_id>`
|
||
* `X‑Request‑ID` (propagated for audit correlation)
|
||
|
||
### 5.2 Auth endpoints
|
||
|
||
* `POST /auth/login` (OIDC code flow start for Console)
|
||
* `GET /auth/jwks.json` (proxy/cached from Authority if needed)
|
||
* `GET /auth/whoami` → `{ sub, tenants[], activeTenant, roles, scopes, mfa }`
|
||
* `POST /auth/tokens/service` (tenant admin) → mint robot token with constrained scopes/ttl
|
||
* `POST /auth/tokens/delegate` → mint delegated token with `act` chain
|
||
|
||
### 5.3 Tenant admin endpoints
|
||
|
||
* `POST /tenants` (owner only)
|
||
* `GET /tenants`, `GET /tenants/:id`
|
||
* `POST /tenants/:id/projects`, `GET /tenants/:id/projects`
|
||
* `POST /tenants/:id/members` (assign role), `DELETE /tenants/:id/members/:user`
|
||
* `GET /tenants/:id/audit` (search)
|
||
|
||
### 5.4 Route protection conventions
|
||
|
||
Each route declares:
|
||
|
||
* `resource`, `verb`
|
||
* Whether it requires project constraint
|
||
* Optional policy gates (e.g., `require_mfa`)
|
||
|
||
Example (OpenAPI extension):
|
||
|
||
```yaml
|
||
x-stella-auth:
|
||
resource: sbom
|
||
verb: read
|
||
requireTenant: true
|
||
allowProjectScoped: true
|
||
requireMFA: false
|
||
```
|
||
|
||
---
|
||
|
||
## 6) Documentation changes
|
||
|
||
Create/update:
|
||
|
||
1. `/docs/security/tenancy-overview.md`
|
||
Concepts, knowledge planes, tenant/project model, isolation layers.
|
||
|
||
2. `/docs/security/scopes-and-roles.md`
|
||
Scope grammar, default roles, examples, custom role mapping.
|
||
|
||
3. `/docs/security/authority-config.md`
|
||
How to connect to an OIDC provider, JWKS caching, audience, issuers, time skew, MFA signal.
|
||
|
||
4. `/docs/operations/multi-tenancy.md`
|
||
Running multi‑tenant deployments: quotas, KMS per tenant, object store layout, message topics, backup/restore per tenant.
|
||
|
||
5. `/docs/operations/rls-and-data-isolation.md`
|
||
Postgres RLS policy reference, migrations, troubleshooting leaks.
|
||
|
||
6. `/docs/console/admin-tenants.md`
|
||
Tenant switcher, managing members, roles, audit viewer.
|
||
|
||
7. `/docs/cli/authentication.md`
|
||
`login`, `whoami`, `tenants list`, `--tenant`, service tokens, delegation.
|
||
|
||
8. `/docs/api/authentication.md`
|
||
Headers, error codes, sample requests, OpenAPI `x-stella-auth` annotations.
|
||
|
||
9. `/docs/policy/examples/abac-overlays.md`
|
||
Optional policy snippets: MFA requirements, time windows, IP restrictions, quotas.
|
||
|
||
10. `/docs/install/configuration-reference.md`
|
||
New `STELLA_AUTH_*`, `STELLA_TENANCY_*`, and per‑service flags.
|
||
|
||
Add at the top of each page:
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
---
|
||
|
||
## 7) Implementation plan
|
||
|
||
### Middleware & libraries
|
||
|
||
* Implement `auth/middleware` with:
|
||
|
||
* JWKS cache and signature verification (kid pinning, 10 min refresh).
|
||
* Scope parser and matcher with constraint evaluation.
|
||
* Tenant activator reading header/query and verifying membership.
|
||
* Policy hook for ABAC (feature flag).
|
||
* Decision audit emitter (non‑blocking).
|
||
|
||
### Services
|
||
|
||
* **API:** wrap all handlers with middleware; declare route protection via decorators/annotations; enforce project constraints.
|
||
* **DB access layer:** on connection checkout set `stella.tenant_id`; forbid raw SQL that bypasses the session GUC.
|
||
* **Orchestrator/Task Runner:** include `{tenant_id, project_id}` in job spec; enforce before any IO.
|
||
* **Export/Notify/AI:** stamp tenant in outbound payloads and logs; include it in idempotency keys.
|
||
* **Conseiller/Excitator:** keep global ingestion; ensure linking jobs run with tenant context only.
|
||
|
||
### Console
|
||
|
||
* Add tenant switcher and “whoami” panel.
|
||
* Show role badges; display “why denied?” with scope/policy explanation from 403 payload.
|
||
* Tenant Admin screens: members, roles, service tokens, audit.
|
||
|
||
### CLI
|
||
|
||
* `stella login` (device/code or local browser) and tenant selection.
|
||
* Persist tokens per profile; `--tenant` override; `whoami`.
|
||
* Commands fail with clear errors on scope violation.
|
||
|
||
### Storage
|
||
|
||
* Prefix object store paths by tenant/project.
|
||
* Optional per‑tenant KMS key integration.
|
||
|
||
### Migrations
|
||
|
||
* Add `tenant_id` and backfill.
|
||
* Enable RLS with policies and tests.
|
||
* Seed default tenant for Quickstart.
|
||
|
||
---
|
||
|
||
## 8) Engineering tasks
|
||
|
||
**Auth & middleware**
|
||
|
||
* [ ] Implement JWKS retrieval and caching with rotation tests.
|
||
* [ ] Implement scope parser and matcher with constraint support.
|
||
* [ ] Build HTTP/gRPC interceptors and integrate across services.
|
||
* [ ] Add ABAC policy hook and sample policies.
|
||
* [ ] Emit structured decision audits.
|
||
|
||
**Data & storage**
|
||
|
||
* [ ] Add `tenant_id` columns and indices; backfill in migration.
|
||
* [ ] Enable Postgres RLS policies for all tenant‑scoped tables.
|
||
* [ ] Update ORM/queries to rely on RLS; remove any “WHERE tenant_id = ...” duplication.
|
||
* [ ] Tenant‑prefixed object store paths; optional per‑tenant KMS keys.
|
||
|
||
**Services**
|
||
|
||
* [ ] Annotate all routes with `x-stella-auth` or equivalent decorator.
|
||
* [ ] Propagate tenant context through orchestrator and workers.
|
||
* [ ] Update Conseiller/Excitator linkers to require tenant context.
|
||
|
||
**Console**
|
||
|
||
* [ ] Implement tenant switcher, role display, and “whoami.”
|
||
* [ ] Add Tenant Admin screens (members, projects, service accounts).
|
||
* [ ] Implement “Why denied?” modal reading 403 details.
|
||
|
||
**CLI**
|
||
|
||
* [ ] `login`, `whoami`, `tenants list`, tenant flag and persistence.
|
||
* [ ] Service token minting for tenant admins.
|
||
* [ ] Delegate token creation for robot use.
|
||
|
||
**Audit**
|
||
|
||
* [ ] Create audit_decisions table and producer.
|
||
* [ ] Add search API and Console viewer.
|
||
|
||
**Docs**
|
||
|
||
* [ ] Author the ten docs listed in §6 with examples and diagrams.
|
||
* [ ] Add troubleshooting: common 401/403 causes and fixes.
|
||
* [ ] Add migration guide from single‑tenant to multi‑tenant.
|
||
|
||
**Tests**
|
||
|
||
* [ ] Unit tests for scope matching and token validation.
|
||
* [ ] RLS tests verifying cross‑tenant reads/writes fail.
|
||
* [ ] E2E tests: multi‑tenant users, robot tokens, ABAC overlays, orchestrator runs.
|
||
* [ ] Fuzz tests on header handling to prevent tenant confusion bugs.
|
||
|
||
---
|
||
|
||
## 9) Feature changes required
|
||
|
||
* **All services:** must expose 403 payloads with machine‑and human‑readable denial reasons and the missing scope string.
|
||
* **Export Center:** requires tenant in all manifests; deny cross‑tenant exports by default; allow explicit cross‑tenant mirror via signed bundle.
|
||
* **Notifications:** destinations and templates are tenant‑scoped; sending pipeline stamps tenant.
|
||
* **Advisory AI Assistant:** restricts training context to a tenant’s data; global knowledge plane may be referenced but never log tenant data into global.
|
||
* **Findings Ledger:** partition by tenant; queries must be tenant‑filtered or rely solely on RLS.
|
||
* **Policy Engine:** support condition keys for tenant and actor; ship example policies.
|
||
|
||
---
|
||
|
||
## 10) Acceptance criteria
|
||
|
||
* Requests lacking `X‑Stella‑Tenant` in multi‑tenant mode are rejected unless single‑tenant Quickstart.
|
||
* RLS prevents cross‑tenant leakage proven by tests that attempt blind selects/inserts.
|
||
* CLI can log in, list tenants, select tenant, and perform a job limited to that tenant.
|
||
* Console shows tenant switcher; admin can invite a member and assign roles.
|
||
* Service token can be minted with narrow scopes and expires as configured.
|
||
* Every 403 returns a clear “missing required scope …” with the exact scope string.
|
||
* Conseiller/Excitator continue aggregation‑only behavior; linking jobs run strictly under tenant context.
|
||
* Audit stream captures all permit/deny decisions with correlation IDs.
|
||
|
||
---
|
||
|
||
## 11) Risks & mitigations
|
||
|
||
* **RLS misconfiguration.** Write tests that run with and without RLS; block migrations unless policies are present. Provide a canary query per service on boot to verify isolation.
|
||
* **Scope explosion.** Keep a minimal, stable scope set; use constraints for specificity; document patterns.
|
||
* **JWKS outages.** Cache keys with TTL, support multiple `kid`s, and tolerate short network failures.
|
||
* **Privilege creep in robots.** Short TTLs by default, clear UI to rotate/revoke, and audit for usage anomalies.
|
||
* **Tenant confusion bugs.** Require tenant header, validate against token memberships, and pin tenant context into DB session and job payloads, never thread‑locals only.
|
||
|
||
---
|
||
|
||
## 12) Philosophy
|
||
|
||
* **Isolation by default.** Tenancy isn’t a UI filter; it’s enforced where the data lives.
|
||
* **Least privilege wins.** Humans and robots get only what they need for as long as they need it.
|
||
* **Explain denials.** If the platform can’t explain “why no,” it’s broken.
|
||
* **Global vs tenant plane.** Public knowledge is shared; customer data is not, ever.
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|