38 KiB
No file to print Fine. Identity and tenancy: the part everyone underestimates until they trip over it in prod. Here’s the clean, doc‑ready version.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Epic 14: Authority‑Backed Scopes & Tenancy
Short name: Authority‑Backed Scopes & Tenancy
Primary components: Authority (authN/Z), Web Services API, Policy Engine, Orchestrator, Task Runner, Console, CLI
Surfaces: /auth/*, request middleware, DB schema (RLS), object storage layout, message bus topics, audit logs, CLI login/impersonation flows
Touches: Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant
AOC ground rule reminder: Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Enforcement of aggregation‑only behavior is tenant‑agnostic and must hold across all scopes.
1) What it is
A uniform model for identity, authorization, and isolation that is enforced end‑to‑end:
- Authority‑backed tokens: JWT/OIDC tokens issued by a configured Authority. Tokens carry scopes, roles, and tenant memberships as claims. Services verify and authorize locally; no out‑of‑band ACL calls during the hot path.
- Tenancy: First‑class multi‑tenant isolation with optional projects within a tenant. Strong separation at the database layer via row‑level security (RLS) and in object storage via tenant‑prefixed paths (and optionally per‑tenant KMS keys).
- Scopes & roles: Minimal set of composable scopes (
stella:{resource}:{verb}) that map to roles (viewer,editor,operator,admin,owner) and can be constrained to{tenant}/{project}. - Context propagation: Every API request, job, message, and artifact is stamped with
{tenant_id, project_id, actor}and validated at ingress and again at persistence. - Service accounts & delegation: Robot identities with scoped, expiring credentials for CI, Task Packs, and webhooks. Human-to-robot delegation is explicit and auditable.
- Audit: Immutable decision logs for authN/Z events with resource, scope, and policy evaluation outcomes.
Tenancy model:
Organization (optional, for billing)
└── Tenant (isolation boundary)
├── Projects (isolation + grouping)
│ ├── Sources (registries, repos)
│ ├── Jobs & Runs
│ ├── SBOMs & Artifacts
│ ├── Findings / Evaluations
│ └── Policies (bound or inherited)
└── Shared tenant services (notifications, exports, secrets)
Knowledge planes:
- Global knowledge plane: Advisories, CVE metadata, CPE, KEV, etc. No tenant data.
- Tenant plane: SBOMs, VEX attachments, policy results, exposures, notifications, exports, audits.
Conseiller/Excitator live across both planes: they collect into the global plane and link to tenant plane without merging sources.
2) Why (brief)
Security that depends on “being careful” is not security. We need hard boundaries the platform cannot cross by accident:
- Run many teams/customers safely on one deployment.
- Minimize blast radius for credentials and mistakes.
- Make CI and automation safe with least‑privileged scopes.
- Keep latency low by verifying locally with signed tokens.
3) How it should work (maximum detail)
3.1 Tokens, claims, and scopes
Token type: JWT, signed by the Authority. Services cache JWKS and verify locally.
Required claims:
iss,sub,aud,iat,expscope: space‑separated scopes (stella:sbom:read,stella:job:run)tenants: array of tenant IDs the subject may accesstenant(active): the currently selected tenant for the requestroles: object map{ "<tenant>": ["viewer", "editor", ...] }projects(optional): array of project IDs or{ "<tenant>": ["projA", "projB"] }mfa(optional): boolean or level for step‑up enforcementact(optional): actor chain for delegation/impersonation
Scope grammar:
stella:{resource}:{verb}[#{constraint}]
resource ∈ {tenant, project, source, job, sbom, vex, advisory, policy, finding, export, notify, secret, pack, ledger, console}
verb ∈ {read, list, write, delete, run, execute, approve, admin}
constraint := tenant/{tenantId}[/project/{projectId}]
Examples:
stella:sbom:read#tenant/t-123/project/p-abcstella:job:run#tenant/t-123stella:tenant:admin#tenant/t-123(Tenant Owner)
Role mapping (default):
viewer→ read/list on most resourceseditor→ viewer + write on sbom/policyoperator→ editor + job:run, export:run, notify:manageadmin→ operator + user/role management inside tenantowner→ admin + billing/tenant lifecycle
3.2 Selecting the active tenant
- API:
X‑Stella‑Tenant: <tenant_id>header or?tenant=<id>query. If omitted and the token has exactly one tenant, that tenant is active; else 400. - Console: Tenant switcher in the top bar. Console includes header on all calls.
- CLI:
stella login --tenant <id>sets the default; override per command--tenant.
All services must reject requests where the active tenant is not in tenants[] and scopes do not include that tenant constraint.
3.3 Request pipeline
- Authentication middleware: verify JWT signature and expiry.
- Tenant activation: pick active tenant per header; set per‑request context
{tenant_id, actor}. - Scope check: compare required scope for the route with token scopes. If route accepts project limiters, check constraints align.
- Policy overlay (optional): ABAC evaluation for fine controls (e.g., “deny job:run outside business hours”).
- Persistence guard: set DB session GUC
stella.tenant_idand verify any writes include matchingtenant_id. Enforce Postgres RLS. - Audit: write decision to audit bus (async) with
permit|deny, reasons, and matched rule.
3.4 Database isolation
Approach: shared schema with Row Level Security. Every tenant‑scoped table includes tenant_id and optional project_id.
RLS policy template (Postgres):
ALTER TABLE sboms ENABLE ROW LEVEL SECURITY;
CREATE POLICY sboms_isolate ON sboms
USING (tenant_id = current_setting('stella.tenant_id', true));
-- For INSERT/UPDATE guard:
CREATE POLICY sboms_write_guard ON sboms
AS PERMISSIVE FOR ALL
TO PUBLIC
WITH CHECK (tenant_id = current_setting('stella.tenant_id', true));
Set stella.tenant_id at connection checkout:
SELECT set_config('stella.tenant_id', $1, true); -- $1 = active tenant
Migrations:
- Add
tenant_idto all tenant‑scoped tables. - Backfill existing rows with the default tenant in Quickstart.
- Enable RLS and policies in a reversible migration.
3.5 Object storage and artifacts
- Layout:
s3://<bucket>/tenants/<tenant_id>/projects/<project_id>/<resource>/<uuid>... - KMS keys: optional per‑tenant key alias. Map via
kms_alias = "stella-<tenant_id>". - Ensure Task Runner and Export Center only operate within the prefixed path of the active tenant.
3.6 Message bus topics
- Topic naming:
stella.<tenant_id>.<domain>.<event>for tenant‑scoped events. - Global knowledge events remain
stella.global.kb.*. - Subscriptions always include a tenant filter unless consuming global knowledge.
3.7 Background workers
- Orchestrator & Task Runner: each job carries
{tenant_id, project_id}. Workers setstella.tenant_idbefore any DB or object store access. Reject jobs that miss the context. - Conseiller/Excitator: ingest to global plane; linking jobs (matching advisories to tenant SBOMs) run per tenant and respect RLS.
3.8 Policy overlay (optional but recommended)
Integrate Policy Engine with condition keys:
tenant,project,resource.type,resource.id,actor.role,actor.mfa,time,ip. Examples:- Deny
job:runfrom IPs outside CIDR. - Require
mfa=trueto approve notifications templates. - Quotas: “max exports per hour per tenant.”
3.9 Service accounts & delegation
- Robot principals:
sa:{tenant}:{name}with scopes constrained to tenant/project. Default TTL 1h; max TTL policy‑controlled. - Token minting: Tenant admins can generate tokens via API/Console; all tokens auditable; optional bound to CIDR or workload identity.
- Delegation:
stella token delegate --to sa:... --scopes ... --ttl 15mproduces a token withactchain, recorded in audit log.
3.10 Auditing
Every decision logs:
ts, tenant, actor, route, resource, action, effect, reason, scopes_used, policy_rule_id- Persist in tenant‑scoped audit table and stream to
stella.<tenant>.audit.decisions. - Expose search/filter in Console → Admin → Audit.
3.11 CLI and Console UX
- CLI:
stella login,stella whoami,stella tenants list,--tenantflag everywhere. Clear error if token lacks tenant or scope. - Console: Tenant switcher, role badges, “why denied?” modal showing scope and policy reasons without leaking internals.
- Impersonation (admin only):
sudo as <user>for debugging with visible banner; issues delegated token withactchain.
3.12 Compatibility modes
- Quickstart single‑tenant: hidden tenant
local. Header optional. RLS enabled with constant. - Multi‑tenant: full model active; migrations buttoned up; Console exposes tenant admin.
4) Architecture
4.1 New/updated modules
auth/authority: JWKS fetching, token validation, scope parser, cache.auth/middleware: HTTP/gRPC interceptors for authN/Z, tenant activation, audit emit.auth/roles: role → scope mapping + tenant/project constraints.auth/policy-bridge: optional ABAC evaluation using Policy Engine.storage/tenantctx: helpers to setstella.tenant_idin DB session and object‑store prefixes.audit/decisions: structured logging and bus producer.cli/auth: login, token store, tenant switcher, whoami.
4.2 Data model changes
-
Add
tenant_id(andproject_idwhere appropriate) to:sources, jobs, runs, sboms, components, findings, policies, exports, notifications, secrets, packs, ledger, audits. -
Create tables:
tenants(id, name, status, created_at, owner_user_id)projects(id, tenant_id, name, meta, created_at)memberships(user_id, tenant_id, roles[])service_accounts(id, tenant_id, name, scopes[], created_at, disabled)audit_decisions(...)(tenant‑scoped)
5) APIs and contracts
5.1 Standard headers
Authorization: Bearer <jwt>X‑Stella‑Tenant: <tenant_id>X‑Request‑ID(propagated for audit correlation)
5.2 Auth endpoints
POST /auth/login(OIDC code flow start for Console)GET /auth/jwks.json(proxy/cached from Authority if needed)GET /auth/whoami→{ sub, tenants[], activeTenant, roles, scopes, mfa }POST /auth/tokens/service(tenant admin) → mint robot token with constrained scopes/ttlPOST /auth/tokens/delegate→ mint delegated token withactchain
5.3 Tenant admin endpoints
POST /tenants(owner only)GET /tenants,GET /tenants/:idPOST /tenants/:id/projects,GET /tenants/:id/projectsPOST /tenants/:id/members(assign role),DELETE /tenants/:id/members/:userGET /tenants/:id/audit(search)
5.4 Route protection conventions
Each route declares:
resource,verb- Whether it requires project constraint
- Optional policy gates (e.g.,
require_mfa)
Example (OpenAPI extension):
x-stella-auth:
resource: sbom
verb: read
requireTenant: true
allowProjectScoped: true
requireMFA: false
6) Documentation changes
Create/update:
-
/docs/security/tenancy-overview.mdConcepts, knowledge planes, tenant/project model, isolation layers. -
/docs/security/scopes-and-roles.mdScope grammar, default roles, examples, custom role mapping. -
/docs/security/authority-config.mdHow to connect to an OIDC provider, JWKS caching, audience, issuers, time skew, MFA signal. -
/docs/operations/multi-tenancy.mdRunning multi‑tenant deployments: quotas, KMS per tenant, object store layout, message topics, backup/restore per tenant. -
/docs/operations/rls-and-data-isolation.mdPostgres RLS policy reference, migrations, troubleshooting leaks. -
/docs/console/admin-tenants.mdTenant switcher, managing members, roles, audit viewer. -
/docs/cli/authentication.mdlogin,whoami,tenants list,--tenant, service tokens, delegation. -
/docs/api/authentication.mdHeaders, error codes, sample requests, OpenAPIx-stella-authannotations. -
/docs/policy/examples/abac-overlays.mdOptional policy snippets: MFA requirements, time windows, IP restrictions, quotas. -
/docs/install/configuration-reference.mdNewSTELLA_AUTH_*,STELLA_TENANCY_*, and per‑service flags.
Add at the top of each page:
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
7) Implementation plan
Middleware & libraries
-
Implement
auth/middlewarewith:- JWKS cache and signature verification (kid pinning, 10 min refresh).
- Scope parser and matcher with constraint evaluation.
- Tenant activator reading header/query and verifying membership.
- Policy hook for ABAC (feature flag).
- Decision audit emitter (non‑blocking).
Services
- API: wrap all handlers with middleware; declare route protection via decorators/annotations; enforce project constraints.
- DB access layer: on connection checkout set
stella.tenant_id; forbid raw SQL that bypasses the session GUC. - Orchestrator/Task Runner: include
{tenant_id, project_id}in job spec; enforce before any IO. - Export/Notify/AI: stamp tenant in outbound payloads and logs; include it in idempotency keys.
- Conseiller/Excitator: keep global ingestion; ensure linking jobs run with tenant context only.
Console
- Add tenant switcher and “whoami” panel.
- Show role badges; display “why denied?” with scope/policy explanation from 403 payload.
- Tenant Admin screens: members, roles, service tokens, audit.
CLI
stella login(device/code or local browser) and tenant selection.- Persist tokens per profile;
--tenantoverride;whoami. - Commands fail with clear errors on scope violation.
Storage
- Prefix object store paths by tenant/project.
- Optional per‑tenant KMS key integration.
Migrations
- Add
tenant_idand backfill. - Enable RLS with policies and tests.
- Seed default tenant for Quickstart.
8) Engineering tasks
Auth & middleware
- Implement JWKS retrieval and caching with rotation tests.
- Implement scope parser and matcher with constraint support.
- Build HTTP/gRPC interceptors and integrate across services.
- Add ABAC policy hook and sample policies.
- Emit structured decision audits.
Data & storage
- Add
tenant_idcolumns and indices; backfill in migration. - Enable Postgres RLS policies for all tenant‑scoped tables.
- Update ORM/queries to rely on RLS; remove any “WHERE tenant_id = ...” duplication.
- Tenant‑prefixed object store paths; optional per‑tenant KMS keys.
Services
- Annotate all routes with
x-stella-author equivalent decorator. - Propagate tenant context through orchestrator and workers.
- Update Conseiller/Excitator linkers to require tenant context.
Console
- Implement tenant switcher, role display, and “whoami.”
- Add Tenant Admin screens (members, projects, service accounts).
- Implement “Why denied?” modal reading 403 details.
CLI
login,whoami,tenants list, tenant flag and persistence.- Service token minting for tenant admins.
- Delegate token creation for robot use.
Audit
- Create audit_decisions table and producer.
- Add search API and Console viewer.
Docs
- Author the ten docs listed in §6 with examples and diagrams.
- Add troubleshooting: common 401/403 causes and fixes.
- Add migration guide from single‑tenant to multi‑tenant.
Tests
- Unit tests for scope matching and token validation.
- RLS tests verifying cross‑tenant reads/writes fail.
- E2E tests: multi‑tenant users, robot tokens, ABAC overlays, orchestrator runs.
- Fuzz tests on header handling to prevent tenant confusion bugs.
9) Feature changes required
- All services: must expose 403 payloads with machine‑and human‑readable denial reasons and the missing scope string.
- Export Center: requires tenant in all manifests; deny cross‑tenant exports by default; allow explicit cross‑tenant mirror via signed bundle.
- Notifications: destinations and templates are tenant‑scoped; sending pipeline stamps tenant.
- Advisory AI Assistant: restricts training context to a tenant’s data; global knowledge plane may be referenced but never log tenant data into global.
- Findings Ledger: partition by tenant; queries must be tenant‑filtered or rely solely on RLS.
- Policy Engine: support condition keys for tenant and actor; ship example policies.
10) Acceptance criteria
- Requests lacking
X‑Stella‑Tenantin multi‑tenant mode are rejected unless single‑tenant Quickstart. - RLS prevents cross‑tenant leakage proven by tests that attempt blind selects/inserts.
- CLI can log in, list tenants, select tenant, and perform a job limited to that tenant.
- Console shows tenant switcher; admin can invite a member and assign roles.
- Service token can be minted with narrow scopes and expires as configured.
- Every 403 returns a clear “missing required scope …” with the exact scope string.
- Conseiller/Excitator continue aggregation‑only behavior; linking jobs run strictly under tenant context.
- Audit stream captures all permit/deny decisions with correlation IDs.
11) Risks & mitigations
- RLS misconfiguration. Write tests that run with and without RLS; block migrations unless policies are present. Provide a canary query per service on boot to verify isolation.
- Scope explosion. Keep a minimal, stable scope set; use constraints for specificity; document patterns.
- JWKS outages. Cache keys with TTL, support multiple
kids, and tolerate short network failures. - Privilege creep in robots. Short TTLs by default, clear UI to rotate/revoke, and audit for usage anomalies.
- Tenant confusion bugs. Require tenant header, validate against token memberships, and pin tenant context into DB session and job payloads, never thread‑locals only.
12) Philosophy
- Isolation by default. Tenancy isn’t a UI filter; it’s enforced where the data lives.
- Least privilege wins. Humans and robots get only what they need for as long as they need it.
- Explain denials. If the platform can’t explain “why no,” it’s broken.
- Global vs tenant plane. Public knowledge is shared; customer data is not, ever.
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.