- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution. - Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done. - Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
458 lines
38 KiB
Markdown
458 lines
38 KiB
Markdown
No file to print
|
||
Fine. Identity and tenancy: the part everyone underestimates until they trip over it in prod. Here’s the clean, doc‑ready version.
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
---
|
||
|
||
# Epic 14: Authority‑Backed Scopes & Tenancy
|
||
|
||
**Short name:** `Authority‑Backed Scopes & Tenancy`
|
||
**Primary components:** Authority (authN/Z), Web Services API, Policy Engine, Orchestrator, Task Runner, Console, CLI
|
||
**Surfaces:** `/auth/*`, request middleware, DB schema (RLS), object storage layout, message bus topics, audit logs, CLI login/impersonation flows
|
||
**Touches:** Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant
|
||
|
||
**AOC ground rule reminder:** Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Enforcement of aggregation‑only behavior is tenant‑agnostic and must hold across all scopes.
|
||
|
||
---
|
||
|
||
## 1) What it is
|
||
|
||
A uniform model for identity, authorization, and isolation that is enforced end‑to‑end:
|
||
|
||
* **Authority‑backed tokens:** JWT/OIDC tokens issued by a configured Authority. Tokens carry **scopes**, **roles**, and **tenant memberships** as claims. Services verify and authorize locally; no out‑of‑band ACL calls during the hot path.
|
||
* **Tenancy:** First‑class multi‑tenant isolation with optional **projects** within a tenant. Strong separation at the database layer via **row‑level security (RLS)** and in object storage via **tenant‑prefixed paths** (and optionally per‑tenant KMS keys).
|
||
* **Scopes & roles:** Minimal set of composable scopes (`stella:{resource}:{verb}`) that map to roles (`viewer`, `editor`, `operator`, `admin`, `owner`) and can be constrained to `{tenant}/{project}`.
|
||
* **Context propagation:** Every API request, job, message, and artifact is stamped with `{tenant_id, project_id, actor}` and validated at ingress and again at persistence.
|
||
* **Service accounts & delegation:** Robot identities with scoped, expiring credentials for CI, Task Packs, and webhooks. Human-to-robot delegation is explicit and auditable.
|
||
* **Audit:** Immutable decision logs for authN/Z events with resource, scope, and policy evaluation outcomes.
|
||
|
||
**Tenancy model:**
|
||
|
||
```
|
||
Organization (optional, for billing)
|
||
└── Tenant (isolation boundary)
|
||
├── Projects (isolation + grouping)
|
||
│ ├── Sources (registries, repos)
|
||
│ ├── Jobs & Runs
|
||
│ ├── SBOMs & Artifacts
|
||
│ ├── Findings / Evaluations
|
||
│ └── Policies (bound or inherited)
|
||
└── Shared tenant services (notifications, exports, secrets)
|
||
```
|
||
|
||
**Knowledge planes:**
|
||
|
||
* **Global knowledge plane:** Advisories, CVE metadata, CPE, KEV, etc. No tenant data.
|
||
* **Tenant plane:** SBOMs, VEX attachments, policy results, exposures, notifications, exports, audits.
|
||
|
||
Conseiller/Excitator live across both planes: they collect into the global plane and link to tenant plane without merging sources.
|
||
|
||
---
|
||
|
||
## 2) Why (brief)
|
||
|
||
Security that depends on “being careful” is not security. We need hard boundaries the platform cannot cross by accident:
|
||
|
||
* Run many teams/customers safely on one deployment.
|
||
* Minimize blast radius for credentials and mistakes.
|
||
* Make CI and automation safe with least‑privileged scopes.
|
||
* Keep latency low by verifying locally with signed tokens.
|
||
|
||
---
|
||
|
||
## 3) How it should work (maximum detail)
|
||
|
||
### 3.1 Tokens, claims, and scopes
|
||
|
||
**Token type:** JWT, signed by the Authority. Services cache JWKS and verify locally.
|
||
|
||
**Required claims:**
|
||
|
||
* `iss`, `sub`, `aud`, `iat`, `exp`
|
||
* `scope`: space‑separated scopes (`stella:sbom:read`, `stella:job:run`)
|
||
* `tenants`: array of tenant IDs the subject may access
|
||
* `tenant` (active): the currently selected tenant for the request
|
||
* `roles`: object map `{ "<tenant>": ["viewer", "editor", ...] }`
|
||
* `projects` (optional): array of project IDs or `{ "<tenant>": ["projA", "projB"] }`
|
||
* `mfa` (optional): boolean or level for step‑up enforcement
|
||
* `act` (optional): actor chain for delegation/impersonation
|
||
|
||
**Scope grammar:**
|
||
|
||
```
|
||
stella:{resource}:{verb}[#{constraint}]
|
||
resource ∈ {tenant, project, source, job, sbom, vex, advisory, policy, finding, export, notify, secret, pack, ledger, console}
|
||
verb ∈ {read, list, write, delete, run, execute, approve, admin}
|
||
constraint := tenant/{tenantId}[/project/{projectId}]
|
||
```
|
||
|
||
Examples:
|
||
|
||
* `stella:sbom:read#tenant/t-123/project/p-abc`
|
||
* `stella:job:run#tenant/t-123`
|
||
* `stella:tenant:admin#tenant/t-123` (Tenant Owner)
|
||
|
||
**Role mapping (default):**
|
||
|
||
* `viewer` → read/list on most resources
|
||
* `editor` → viewer + write on sbom/policy
|
||
* `operator` → editor + job:run, export:run, notify:manage
|
||
* `admin` → operator + user/role management inside tenant
|
||
* `owner` → admin + billing/tenant lifecycle
|
||
|
||
### 3.2 Selecting the active tenant
|
||
|
||
* **API:** `X‑Stella‑Tenant: <tenant_id>` header or `?tenant=<id>` query. If omitted and the token has exactly one tenant, that tenant is active; else 400.
|
||
* **Console:** Tenant switcher in the top bar. Console includes header on all calls.
|
||
* **CLI:** `stella login --tenant <id>` sets the default; override per command `--tenant`.
|
||
|
||
All services must reject requests where the active tenant is not in `tenants[]` and scopes do not include that tenant constraint.
|
||
|
||
### 3.3 Request pipeline
|
||
|
||
1. **Authentication middleware:** verify JWT signature and expiry.
|
||
2. **Tenant activation:** pick active tenant per header; set per‑request context `{tenant_id, actor}`.
|
||
3. **Scope check:** compare required scope for the route with token scopes. If route accepts project limiters, check constraints align.
|
||
4. **Policy overlay (optional):** ABAC evaluation for fine controls (e.g., “deny job:run outside business hours”).
|
||
5. **Persistence guard:** set DB session GUC `stella.tenant_id` and verify any writes include matching `tenant_id`. Enforce Postgres RLS.
|
||
6. **Audit:** write decision to audit bus (async) with `permit|deny`, reasons, and matched rule.
|
||
|
||
### 3.4 Database isolation
|
||
|
||
**Approach:** shared schema with **Row Level Security**. Every tenant‑scoped table includes `tenant_id` and optional `project_id`.
|
||
|
||
**RLS policy template (Postgres):**
|
||
|
||
```sql
|
||
ALTER TABLE sboms ENABLE ROW LEVEL SECURITY;
|
||
|
||
CREATE POLICY sboms_isolate ON sboms
|
||
USING (tenant_id = current_setting('stella.tenant_id', true));
|
||
|
||
-- For INSERT/UPDATE guard:
|
||
CREATE POLICY sboms_write_guard ON sboms
|
||
AS PERMISSIVE FOR ALL
|
||
TO PUBLIC
|
||
WITH CHECK (tenant_id = current_setting('stella.tenant_id', true));
|
||
```
|
||
|
||
Set `stella.tenant_id` at connection checkout:
|
||
|
||
```sql
|
||
SELECT set_config('stella.tenant_id', $1, true); -- $1 = active tenant
|
||
```
|
||
|
||
**Migrations:**
|
||
|
||
* Add `tenant_id` to all tenant‑scoped tables.
|
||
* Backfill existing rows with the default tenant in Quickstart.
|
||
* Enable RLS and policies in a reversible migration.
|
||
|
||
### 3.5 Object storage and artifacts
|
||
|
||
* **Layout:** `s3://<bucket>/tenants/<tenant_id>/projects/<project_id>/<resource>/<uuid>...`
|
||
* **KMS keys:** optional per‑tenant key alias. Map via `kms_alias = "stella-<tenant_id>"`.
|
||
* Ensure Task Runner and Export Center only operate within the prefixed path of the active tenant.
|
||
|
||
### 3.6 Message bus topics
|
||
|
||
* Topic naming: `stella.<tenant_id>.<domain>.<event>` for tenant‑scoped events.
|
||
* Global knowledge events remain `stella.global.kb.*`.
|
||
* Subscriptions always include a tenant filter unless consuming global knowledge.
|
||
|
||
### 3.7 Background workers
|
||
|
||
* **Orchestrator & Task Runner:** each job carries `{tenant_id, project_id}`. Workers set `stella.tenant_id` before any DB or object store access. Reject jobs that miss the context.
|
||
* **Conseiller/Excitator:** ingest to global plane; linking jobs (matching advisories to tenant SBOMs) run per tenant and respect RLS.
|
||
|
||
### 3.8 Policy overlay (optional but recommended)
|
||
|
||
Integrate Policy Engine with condition keys:
|
||
|
||
* `tenant`, `project`, `resource.type`, `resource.id`, `actor.role`, `actor.mfa`, `time`, `ip`.
|
||
Examples:
|
||
* Deny `job:run` from IPs outside CIDR.
|
||
* Require `mfa=true` to approve notifications templates.
|
||
* Quotas: “max exports per hour per tenant.”
|
||
|
||
### 3.9 Service accounts & delegation
|
||
|
||
* **Robot principals:** `sa:{tenant}:{name}` with scopes constrained to tenant/project. Default TTL 1h; max TTL policy‑controlled.
|
||
* **Token minting:** Tenant admins can generate tokens via API/Console; all tokens auditable; optional bound to CIDR or workload identity.
|
||
* **Delegation:** `stella token delegate --to sa:... --scopes ... --ttl 15m` produces a token with `act` chain, recorded in audit log.
|
||
|
||
### 3.10 Auditing
|
||
|
||
Every decision logs:
|
||
|
||
* `ts, tenant, actor, route, resource, action, effect, reason, scopes_used, policy_rule_id`
|
||
* Persist in tenant‑scoped audit table and stream to `stella.<tenant>.audit.decisions`.
|
||
* Expose search/filter in Console → Admin → Audit.
|
||
|
||
### 3.11 CLI and Console UX
|
||
|
||
* **CLI:** `stella login`, `stella whoami`, `stella tenants list`, `--tenant` flag everywhere. Clear error if token lacks tenant or scope.
|
||
* **Console:** Tenant switcher, role badges, “why denied?” modal showing scope and policy reasons without leaking internals.
|
||
* **Impersonation (admin only):** `sudo as <user>` for debugging with visible banner; issues delegated token with `act` chain.
|
||
|
||
### 3.12 Compatibility modes
|
||
|
||
* **Quickstart single‑tenant:** hidden tenant `local`. Header optional. RLS enabled with constant.
|
||
* **Multi‑tenant:** full model active; migrations buttoned up; Console exposes tenant admin.
|
||
|
||
---
|
||
|
||
## 4) Architecture
|
||
|
||
### 4.1 New/updated modules
|
||
|
||
* `auth/authority`: JWKS fetching, token validation, scope parser, cache.
|
||
* `auth/middleware`: HTTP/gRPC interceptors for authN/Z, tenant activation, audit emit.
|
||
* `auth/roles`: role → scope mapping + tenant/project constraints.
|
||
* `auth/policy-bridge`: optional ABAC evaluation using Policy Engine.
|
||
* `storage/tenantctx`: helpers to set `stella.tenant_id` in DB session and object‑store prefixes.
|
||
* `audit/decisions`: structured logging and bus producer.
|
||
* `cli/auth`: login, token store, tenant switcher, whoami.
|
||
|
||
### 4.2 Data model changes
|
||
|
||
* Add `tenant_id` (and `project_id` where appropriate) to: `sources, jobs, runs, sboms, components, findings, policies, exports, notifications, secrets, packs, ledger, audits`.
|
||
* Create tables:
|
||
|
||
* `tenants(id, name, status, created_at, owner_user_id)`
|
||
* `projects(id, tenant_id, name, meta, created_at)`
|
||
* `memberships(user_id, tenant_id, roles[])`
|
||
* `service_accounts(id, tenant_id, name, scopes[], created_at, disabled)`
|
||
* `audit_decisions(...)` (tenant‑scoped)
|
||
|
||
---
|
||
|
||
## 5) APIs and contracts
|
||
|
||
### 5.1 Standard headers
|
||
|
||
* `Authorization: Bearer <jwt>`
|
||
* `X‑Stella‑Tenant: <tenant_id>`
|
||
* `X‑Request‑ID` (propagated for audit correlation)
|
||
|
||
### 5.2 Auth endpoints
|
||
|
||
* `POST /auth/login` (OIDC code flow start for Console)
|
||
* `GET /auth/jwks.json` (proxy/cached from Authority if needed)
|
||
* `GET /auth/whoami` → `{ sub, tenants[], activeTenant, roles, scopes, mfa }`
|
||
* `POST /auth/tokens/service` (tenant admin) → mint robot token with constrained scopes/ttl
|
||
* `POST /auth/tokens/delegate` → mint delegated token with `act` chain
|
||
|
||
### 5.3 Tenant admin endpoints
|
||
|
||
* `POST /tenants` (owner only)
|
||
* `GET /tenants`, `GET /tenants/:id`
|
||
* `POST /tenants/:id/projects`, `GET /tenants/:id/projects`
|
||
* `POST /tenants/:id/members` (assign role), `DELETE /tenants/:id/members/:user`
|
||
* `GET /tenants/:id/audit` (search)
|
||
|
||
### 5.4 Route protection conventions
|
||
|
||
Each route declares:
|
||
|
||
* `resource`, `verb`
|
||
* Whether it requires project constraint
|
||
* Optional policy gates (e.g., `require_mfa`)
|
||
|
||
Example (OpenAPI extension):
|
||
|
||
```yaml
|
||
x-stella-auth:
|
||
resource: sbom
|
||
verb: read
|
||
requireTenant: true
|
||
allowProjectScoped: true
|
||
requireMFA: false
|
||
```
|
||
|
||
---
|
||
|
||
## 6) Documentation changes
|
||
|
||
Create/update:
|
||
|
||
1. `/docs/security/tenancy-overview.md`
|
||
Concepts, knowledge planes, tenant/project model, isolation layers.
|
||
|
||
2. `/docs/security/scopes-and-roles.md`
|
||
Scope grammar, default roles, examples, custom role mapping.
|
||
|
||
3. `/docs/security/authority-config.md`
|
||
How to connect to an OIDC provider, JWKS caching, audience, issuers, time skew, MFA signal.
|
||
|
||
4. `/docs/operations/multi-tenancy.md`
|
||
Running multi‑tenant deployments: quotas, KMS per tenant, object store layout, message topics, backup/restore per tenant.
|
||
|
||
5. `/docs/operations/rls-and-data-isolation.md`
|
||
Postgres RLS policy reference, migrations, troubleshooting leaks.
|
||
|
||
6. `/docs/console/admin-tenants.md`
|
||
Tenant switcher, managing members, roles, audit viewer.
|
||
|
||
7. `/docs/cli/authentication.md`
|
||
`login`, `whoami`, `tenants list`, `--tenant`, service tokens, delegation.
|
||
|
||
8. `/docs/api/authentication.md`
|
||
Headers, error codes, sample requests, OpenAPI `x-stella-auth` annotations.
|
||
|
||
9. `/docs/policy/examples/abac-overlays.md`
|
||
Optional policy snippets: MFA requirements, time windows, IP restrictions, quotas.
|
||
|
||
10. `/docs/install/configuration-reference.md`
|
||
New `STELLA_AUTH_*`, `STELLA_TENANCY_*`, and per‑service flags.
|
||
|
||
Add at the top of each page:
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
---
|
||
|
||
## 7) Implementation plan
|
||
|
||
### Middleware & libraries
|
||
|
||
* Implement `auth/middleware` with:
|
||
|
||
* JWKS cache and signature verification (kid pinning, 10 min refresh).
|
||
* Scope parser and matcher with constraint evaluation.
|
||
* Tenant activator reading header/query and verifying membership.
|
||
* Policy hook for ABAC (feature flag).
|
||
* Decision audit emitter (non‑blocking).
|
||
|
||
### Services
|
||
|
||
* **API:** wrap all handlers with middleware; declare route protection via decorators/annotations; enforce project constraints.
|
||
* **DB access layer:** on connection checkout set `stella.tenant_id`; forbid raw SQL that bypasses the session GUC.
|
||
* **Orchestrator/Task Runner:** include `{tenant_id, project_id}` in job spec; enforce before any IO.
|
||
* **Export/Notify/AI:** stamp tenant in outbound payloads and logs; include it in idempotency keys.
|
||
* **Conseiller/Excitator:** keep global ingestion; ensure linking jobs run with tenant context only.
|
||
|
||
### Console
|
||
|
||
* Add tenant switcher and “whoami” panel.
|
||
* Show role badges; display “why denied?” with scope/policy explanation from 403 payload.
|
||
* Tenant Admin screens: members, roles, service tokens, audit.
|
||
|
||
### CLI
|
||
|
||
* `stella login` (device/code or local browser) and tenant selection.
|
||
* Persist tokens per profile; `--tenant` override; `whoami`.
|
||
* Commands fail with clear errors on scope violation.
|
||
|
||
### Storage
|
||
|
||
* Prefix object store paths by tenant/project.
|
||
* Optional per‑tenant KMS key integration.
|
||
|
||
### Migrations
|
||
|
||
* Add `tenant_id` and backfill.
|
||
* Enable RLS with policies and tests.
|
||
* Seed default tenant for Quickstart.
|
||
|
||
---
|
||
|
||
## 8) Engineering tasks
|
||
|
||
**Auth & middleware**
|
||
|
||
* [ ] Implement JWKS retrieval and caching with rotation tests.
|
||
* [ ] Implement scope parser and matcher with constraint support.
|
||
* [ ] Build HTTP/gRPC interceptors and integrate across services.
|
||
* [ ] Add ABAC policy hook and sample policies.
|
||
* [ ] Emit structured decision audits.
|
||
|
||
**Data & storage**
|
||
|
||
* [ ] Add `tenant_id` columns and indices; backfill in migration.
|
||
* [ ] Enable Postgres RLS policies for all tenant‑scoped tables.
|
||
* [ ] Update ORM/queries to rely on RLS; remove any “WHERE tenant_id = ...” duplication.
|
||
* [ ] Tenant‑prefixed object store paths; optional per‑tenant KMS keys.
|
||
|
||
**Services**
|
||
|
||
* [ ] Annotate all routes with `x-stella-auth` or equivalent decorator.
|
||
* [ ] Propagate tenant context through orchestrator and workers.
|
||
* [ ] Update Conseiller/Excitator linkers to require tenant context.
|
||
|
||
**Console**
|
||
|
||
* [ ] Implement tenant switcher, role display, and “whoami.”
|
||
* [ ] Add Tenant Admin screens (members, projects, service accounts).
|
||
* [ ] Implement “Why denied?” modal reading 403 details.
|
||
|
||
**CLI**
|
||
|
||
* [ ] `login`, `whoami`, `tenants list`, tenant flag and persistence.
|
||
* [ ] Service token minting for tenant admins.
|
||
* [ ] Delegate token creation for robot use.
|
||
|
||
**Audit**
|
||
|
||
* [ ] Create audit_decisions table and producer.
|
||
* [ ] Add search API and Console viewer.
|
||
|
||
**Docs**
|
||
|
||
* [ ] Author the ten docs listed in §6 with examples and diagrams.
|
||
* [ ] Add troubleshooting: common 401/403 causes and fixes.
|
||
* [ ] Add migration guide from single‑tenant to multi‑tenant.
|
||
|
||
**Tests**
|
||
|
||
* [ ] Unit tests for scope matching and token validation.
|
||
* [ ] RLS tests verifying cross‑tenant reads/writes fail.
|
||
* [ ] E2E tests: multi‑tenant users, robot tokens, ABAC overlays, orchestrator runs.
|
||
* [ ] Fuzz tests on header handling to prevent tenant confusion bugs.
|
||
|
||
---
|
||
|
||
## 9) Feature changes required
|
||
|
||
* **All services:** must expose 403 payloads with machine‑and human‑readable denial reasons and the missing scope string.
|
||
* **Export Center:** requires tenant in all manifests; deny cross‑tenant exports by default; allow explicit cross‑tenant mirror via signed bundle.
|
||
* **Notifications:** destinations and templates are tenant‑scoped; sending pipeline stamps tenant.
|
||
* **Advisory AI Assistant:** restricts training context to a tenant’s data; global knowledge plane may be referenced but never log tenant data into global.
|
||
* **Findings Ledger:** partition by tenant; queries must be tenant‑filtered or rely solely on RLS.
|
||
* **Policy Engine:** support condition keys for tenant and actor; ship example policies.
|
||
|
||
---
|
||
|
||
## 10) Acceptance criteria
|
||
|
||
* Requests lacking `X‑Stella‑Tenant` in multi‑tenant mode are rejected unless single‑tenant Quickstart.
|
||
* RLS prevents cross‑tenant leakage proven by tests that attempt blind selects/inserts.
|
||
* CLI can log in, list tenants, select tenant, and perform a job limited to that tenant.
|
||
* Console shows tenant switcher; admin can invite a member and assign roles.
|
||
* Service token can be minted with narrow scopes and expires as configured.
|
||
* Every 403 returns a clear “missing required scope …” with the exact scope string.
|
||
* Conseiller/Excitator continue aggregation‑only behavior; linking jobs run strictly under tenant context.
|
||
* Audit stream captures all permit/deny decisions with correlation IDs.
|
||
|
||
---
|
||
|
||
## 11) Risks & mitigations
|
||
|
||
* **RLS misconfiguration.** Write tests that run with and without RLS; block migrations unless policies are present. Provide a canary query per service on boot to verify isolation.
|
||
* **Scope explosion.** Keep a minimal, stable scope set; use constraints for specificity; document patterns.
|
||
* **JWKS outages.** Cache keys with TTL, support multiple `kid`s, and tolerate short network failures.
|
||
* **Privilege creep in robots.** Short TTLs by default, clear UI to rotate/revoke, and audit for usage anomalies.
|
||
* **Tenant confusion bugs.** Require tenant header, validate against token memberships, and pin tenant context into DB session and job payloads, never thread‑locals only.
|
||
|
||
---
|
||
|
||
## 12) Philosophy
|
||
|
||
* **Isolation by default.** Tenancy isn’t a UI filter; it’s enforced where the data lives.
|
||
* **Least privilege wins.** Humans and robots get only what they need for as long as they need it.
|
||
* **Explain denials.** If the platform can’t explain “why no,” it’s broken.
|
||
* **Global vs tenant plane.** Public knowledge is shared; customer data is not, ever.
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|