stella-ops.org/git.stella-ops.org

Fork 0

Files

root 68da90a11a

Docs CI / lint-and-preview (push) Has been cancelled

Details

Restructure solution layout by module

2025-10-28 15:10:40 +02:00

38 KiB

Raw Blame History

No file to print Fine. Identity and tenancy: the part everyone underestimates until they trip over it in prod. Here’s the clean, doc‑ready version.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

Epic 14: Authority‑Backed Scopes & Tenancy

Short name: Authority‑Backed Scopes & Tenancy Primary components: Authority (authN/Z), Web Services API, Policy Engine, Orchestrator, Task Runner, Console, CLI Surfaces: /auth/*, request middleware, DB schema (RLS), object storage layout, message bus topics, audit logs, CLI login/impersonation flows Touches: Conseiller (Feedser), Excitator (Vexer), Findings Ledger, Export Center, Notifications Studio, Advisory AI Assistant

AOC ground rule reminder: Conseiller and Excitator aggregate and link advisories/VEX. They never merge or mutate source records. Enforcement of aggregation‑only behavior is tenant‑agnostic and must hold across all scopes.

1) What it is

A uniform model for identity, authorization, and isolation that is enforced end‑to‑end:

Authority‑backed tokens: JWT/OIDC tokens issued by a configured Authority. Tokens carry scopes, roles, and tenant memberships as claims. Services verify and authorize locally; no out‑of‑band ACL calls during the hot path.
Tenancy: First‑class multi‑tenant isolation with optional projects within a tenant. Strong separation at the database layer via row‑level security (RLS) and in object storage via tenant‑prefixed paths (and optionally per‑tenant KMS keys).
Scopes & roles: Minimal set of composable scopes (stella:{resource}:{verb}) that map to roles (viewer, editor, operator, admin, owner) and can be constrained to {tenant}/{project}.
Context propagation: Every API request, job, message, and artifact is stamped with {tenant_id, project_id, actor} and validated at ingress and again at persistence.
Service accounts & delegation: Robot identities with scoped, expiring credentials for CI, Task Packs, and webhooks. Human-to-robot delegation is explicit and auditable.
Audit: Immutable decision logs for authN/Z events with resource, scope, and policy evaluation outcomes.

Tenancy model:

Organization (optional, for billing) 
└── Tenant (isolation boundary)
    ├── Projects (isolation + grouping)
    │   ├── Sources (registries, repos)
    │   ├── Jobs & Runs
    │   ├── SBOMs & Artifacts
    │   ├── Findings / Evaluations
    │   └── Policies (bound or inherited)
    └── Shared tenant services (notifications, exports, secrets)

Knowledge planes:

Global knowledge plane: Advisories, CVE metadata, CPE, KEV, etc. No tenant data.
Tenant plane: SBOMs, VEX attachments, policy results, exposures, notifications, exports, audits.

Conseiller/Excitator live across both planes: they collect into the global plane and link to tenant plane without merging sources.

2) Why (brief)

Security that depends on “being careful” is not security. We need hard boundaries the platform cannot cross by accident:

Run many teams/customers safely on one deployment.
Minimize blast radius for credentials and mistakes.
Make CI and automation safe with least‑privileged scopes.
Keep latency low by verifying locally with signed tokens.

3) How it should work (maximum detail)

3.1 Tokens, claims, and scopes

Token type: JWT, signed by the Authority. Services cache JWKS and verify locally.

Required claims:

iss, sub, aud, iat, exp
scope: space‑separated scopes (stella:sbom:read, stella:job:run)
tenants: array of tenant IDs the subject may access
tenant (active): the currently selected tenant for the request
roles: object map { "<tenant>": ["viewer", "editor", ...] }
projects (optional): array of project IDs or { "<tenant>": ["projA", "projB"] }
mfa (optional): boolean or level for step‑up enforcement
act (optional): actor chain for delegation/impersonation

Scope grammar:

stella:{resource}:{verb}[#{constraint}]
  resource ∈ {tenant, project, source, job, sbom, vex, advisory, policy, finding, export, notify, secret, pack, ledger, console}
  verb ∈ {read, list, write, delete, run, execute, approve, admin}
  constraint := tenant/{tenantId}[/project/{projectId}]

Examples:

stella:sbom:read#tenant/t-123/project/p-abc
stella:job:run#tenant/t-123
stella:tenant:admin#tenant/t-123 (Tenant Owner)

Role mapping (default):

viewer → read/list on most resources
editor → viewer + write on sbom/policy
operator → editor + job:run, export:run, notify:manage
admin → operator + user/role management inside tenant
owner → admin + billing/tenant lifecycle

3.2 Selecting the active tenant

API: X‑Stella‑Tenant: <tenant_id> header or ?tenant=<id> query. If omitted and the token has exactly one tenant, that tenant is active; else 400.
Console: Tenant switcher in the top bar. Console includes header on all calls.
CLI: stella login --tenant <id> sets the default; override per command --tenant.

All services must reject requests where the active tenant is not in tenants[] and scopes do not include that tenant constraint.

3.3 Request pipeline

Authentication middleware: verify JWT signature and expiry.
Tenant activation: pick active tenant per header; set per‑request context {tenant_id, actor}.
Scope check: compare required scope for the route with token scopes. If route accepts project limiters, check constraints align.
Policy overlay (optional): ABAC evaluation for fine controls (e.g., “deny job:run outside business hours”).
Persistence guard: set DB session GUC stella.tenant_id and verify any writes include matching tenant_id. Enforce Postgres RLS.
Audit: write decision to audit bus (async) with permit|deny, reasons, and matched rule.

3.4 Database isolation

Approach: shared schema with Row Level Security. Every tenant‑scoped table includes tenant_id and optional project_id.

RLS policy template (Postgres):

ALTER TABLE sboms ENABLE ROW LEVEL SECURITY;

CREATE POLICY sboms_isolate ON sboms
USING (tenant_id = current_setting('stella.tenant_id', true));

-- For INSERT/UPDATE guard:
CREATE POLICY sboms_write_guard ON sboms
AS PERMISSIVE FOR ALL
TO PUBLIC
WITH CHECK (tenant_id = current_setting('stella.tenant_id', true));

Set stella.tenant_id at connection checkout:

SELECT set_config('stella.tenant_id', $1, true); -- $1 = active tenant

Migrations:

Add tenant_id to all tenant‑scoped tables.
Backfill existing rows with the default tenant in Quickstart.
Enable RLS and policies in a reversible migration.

3.5 Object storage and artifacts

Layout: s3://<bucket>/tenants/<tenant_id>/projects/<project_id>/<resource>/<uuid>...
KMS keys: optional per‑tenant key alias. Map via kms_alias = "stella-<tenant_id>".
Ensure Task Runner and Export Center only operate within the prefixed path of the active tenant.

3.6 Message bus topics

Topic naming: stella.<tenant_id>.<domain>.<event> for tenant‑scoped events.
Global knowledge events remain stella.global.kb.*.
Subscriptions always include a tenant filter unless consuming global knowledge.

3.7 Background workers

Orchestrator & Task Runner: each job carries {tenant_id, project_id}. Workers set stella.tenant_id before any DB or object store access. Reject jobs that miss the context.
Conseiller/Excitator: ingest to global plane; linking jobs (matching advisories to tenant SBOMs) run per tenant and respect RLS.

3.8 Policy overlay (optional but recommended)

Integrate Policy Engine with condition keys:

tenant, project, resource.type, resource.id, actor.role, actor.mfa, time, ip. Examples:
Deny job:run from IPs outside CIDR.
Require mfa=true to approve notifications templates.
Quotas: “max exports per hour per tenant.”

3.9 Service accounts & delegation

Robot principals: sa:{tenant}:{name} with scopes constrained to tenant/project. Default TTL 1h; max TTL policy‑controlled.
Token minting: Tenant admins can generate tokens via API/Console; all tokens auditable; optional bound to CIDR or workload identity.
Delegation: stella token delegate --to sa:... --scopes ... --ttl 15m produces a token with act chain, recorded in audit log.

3.10 Auditing

Every decision logs:

ts, tenant, actor, route, resource, action, effect, reason, scopes_used, policy_rule_id
Persist in tenant‑scoped audit table and stream to stella.<tenant>.audit.decisions.
Expose search/filter in Console → Admin → Audit.

3.11 CLI and Console UX

CLI: stella login, stella whoami, stella tenants list, --tenant flag everywhere. Clear error if token lacks tenant or scope.
Console: Tenant switcher, role badges, “why denied?” modal showing scope and policy reasons without leaking internals.
Impersonation (admin only): sudo as <user> for debugging with visible banner; issues delegated token with act chain.

3.12 Compatibility modes

Quickstart single‑tenant: hidden tenant local. Header optional. RLS enabled with constant.
Multi‑tenant: full model active; migrations buttoned up; Console exposes tenant admin.

4) Architecture

4.1 New/updated modules

auth/authority: JWKS fetching, token validation, scope parser, cache.
auth/middleware: HTTP/gRPC interceptors for authN/Z, tenant activation, audit emit.
auth/roles: role → scope mapping + tenant/project constraints.
auth/policy-bridge: optional ABAC evaluation using Policy Engine.
storage/tenantctx: helpers to set stella.tenant_id in DB session and object‑store prefixes.
audit/decisions: structured logging and bus producer.
cli/auth: login, token store, tenant switcher, whoami.

4.2 Data model changes

Add tenant_id (and project_id where appropriate) to: sources, jobs, runs, sboms, components, findings, policies, exports, notifications, secrets, packs, ledger, audits.
Create tables:
- tenants(id, name, status, created_at, owner_user_id)
- projects(id, tenant_id, name, meta, created_at)
- memberships(user_id, tenant_id, roles[])
- service_accounts(id, tenant_id, name, scopes[], created_at, disabled)
- audit_decisions(...) (tenant‑scoped)

5) APIs and contracts

5.1 Standard headers

Authorization: Bearer <jwt>
X‑Stella‑Tenant: <tenant_id>
X‑Request‑ID (propagated for audit correlation)

5.2 Auth endpoints

POST /auth/login (OIDC code flow start for Console)
GET /auth/jwks.json (proxy/cached from Authority if needed)
GET /auth/whoami → { sub, tenants[], activeTenant, roles, scopes, mfa }
POST /auth/tokens/service (tenant admin) → mint robot token with constrained scopes/ttl
POST /auth/tokens/delegate → mint delegated token with act chain

5.3 Tenant admin endpoints

POST /tenants (owner only)
GET /tenants, GET /tenants/:id
POST /tenants/:id/projects, GET /tenants/:id/projects
POST /tenants/:id/members (assign role), DELETE /tenants/:id/members/:user
GET /tenants/:id/audit (search)

5.4 Route protection conventions

Each route declares:

resource, verb
Whether it requires project constraint
Optional policy gates (e.g., require_mfa)

Example (OpenAPI extension):

x-stella-auth:
  resource: sbom
  verb: read
  requireTenant: true
  allowProjectScoped: true
  requireMFA: false

6) Documentation changes

Create/update:

/docs/security/tenancy-overview.md Concepts, knowledge planes, tenant/project model, isolation layers.
/docs/security/scopes-and-roles.md Scope grammar, default roles, examples, custom role mapping.
/docs/security/authority-config.md How to connect to an OIDC provider, JWKS caching, audience, issuers, time skew, MFA signal.
/docs/operations/multi-tenancy.md Running multi‑tenant deployments: quotas, KMS per tenant, object store layout, message topics, backup/restore per tenant.
/docs/operations/rls-and-data-isolation.md Postgres RLS policy reference, migrations, troubleshooting leaks.
/docs/console/admin-tenants.md Tenant switcher, managing members, roles, audit viewer.
/docs/cli/authentication.md login, whoami, tenants list, --tenant, service tokens, delegation.
/docs/api/authentication.md Headers, error codes, sample requests, OpenAPI x-stella-auth annotations.
/docs/policy/examples/abac-overlays.md Optional policy snippets: MFA requirements, time windows, IP restrictions, quotas.
/docs/install/configuration-reference.md New STELLA_AUTH_*, STELLA_TENANCY_*, and per‑service flags.

Add at the top of each page:

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

7) Implementation plan

Middleware & libraries

Implement auth/middleware with:
- JWKS cache and signature verification (kid pinning, 10 min refresh).
- Scope parser and matcher with constraint evaluation.
- Tenant activator reading header/query and verifying membership.
- Policy hook for ABAC (feature flag).
- Decision audit emitter (non‑blocking).

Services

API: wrap all handlers with middleware; declare route protection via decorators/annotations; enforce project constraints.
DB access layer: on connection checkout set stella.tenant_id; forbid raw SQL that bypasses the session GUC.
Orchestrator/Task Runner: include {tenant_id, project_id} in job spec; enforce before any IO.
Export/Notify/AI: stamp tenant in outbound payloads and logs; include it in idempotency keys.
Conseiller/Excitator: keep global ingestion; ensure linking jobs run with tenant context only.

Console

Add tenant switcher and “whoami” panel.
Show role badges; display “why denied?” with scope/policy explanation from 403 payload.
Tenant Admin screens: members, roles, service tokens, audit.

CLI

stella login (device/code or local browser) and tenant selection.
Persist tokens per profile; --tenant override; whoami.
Commands fail with clear errors on scope violation.

Storage

Prefix object store paths by tenant/project.
Optional per‑tenant KMS key integration.

Migrations

Add tenant_id and backfill.
Enable RLS with policies and tests.
Seed default tenant for Quickstart.

8) Engineering tasks

Auth & middleware

Implement JWKS retrieval and caching with rotation tests.
Implement scope parser and matcher with constraint support.
Build HTTP/gRPC interceptors and integrate across services.
Add ABAC policy hook and sample policies.
Emit structured decision audits.

Data & storage

Add tenant_id columns and indices; backfill in migration.
Enable Postgres RLS policies for all tenant‑scoped tables.
Update ORM/queries to rely on RLS; remove any “WHERE tenant_id = ...” duplication.
Tenant‑prefixed object store paths; optional per‑tenant KMS keys.

Services

Annotate all routes with x-stella-auth or equivalent decorator.
Propagate tenant context through orchestrator and workers.
Update Conseiller/Excitator linkers to require tenant context.

Console

Implement tenant switcher, role display, and “whoami.”
Add Tenant Admin screens (members, projects, service accounts).
Implement “Why denied?” modal reading 403 details.

CLI

login, whoami, tenants list, tenant flag and persistence.
Service token minting for tenant admins.
Delegate token creation for robot use.

Audit

Create audit_decisions table and producer.
Add search API and Console viewer.

Docs

Author the ten docs listed in §6 with examples and diagrams.
Add troubleshooting: common 401/403 causes and fixes.
Add migration guide from single‑tenant to multi‑tenant.

Tests

Unit tests for scope matching and token validation.
RLS tests verifying cross‑tenant reads/writes fail.
E2E tests: multi‑tenant users, robot tokens, ABAC overlays, orchestrator runs.
Fuzz tests on header handling to prevent tenant confusion bugs.

9) Feature changes required

All services: must expose 403 payloads with machine‑and human‑readable denial reasons and the missing scope string.
Export Center: requires tenant in all manifests; deny cross‑tenant exports by default; allow explicit cross‑tenant mirror via signed bundle.
Notifications: destinations and templates are tenant‑scoped; sending pipeline stamps tenant.
Advisory AI Assistant: restricts training context to a tenant’s data; global knowledge plane may be referenced but never log tenant data into global.
Findings Ledger: partition by tenant; queries must be tenant‑filtered or rely solely on RLS.
Policy Engine: support condition keys for tenant and actor; ship example policies.

10) Acceptance criteria

Requests lacking X‑Stella‑Tenant in multi‑tenant mode are rejected unless single‑tenant Quickstart.
RLS prevents cross‑tenant leakage proven by tests that attempt blind selects/inserts.
CLI can log in, list tenants, select tenant, and perform a job limited to that tenant.
Console shows tenant switcher; admin can invite a member and assign roles.
Service token can be minted with narrow scopes and expires as configured.
Every 403 returns a clear “missing required scope …” with the exact scope string.
Conseiller/Excitator continue aggregation‑only behavior; linking jobs run strictly under tenant context.
Audit stream captures all permit/deny decisions with correlation IDs.

11) Risks & mitigations

RLS misconfiguration. Write tests that run with and without RLS; block migrations unless policies are present. Provide a canary query per service on boot to verify isolation.
Scope explosion. Keep a minimal, stable scope set; use constraints for specificity; document patterns.
JWKS outages. Cache keys with TTL, support multiple kids, and tolerate short network failures.
Privilege creep in robots. Short TTLs by default, clear UI to rotate/revoke, and audit for usage anomalies.
Tenant confusion bugs. Require tenant header, validate against token memberships, and pin tenant context into DB session and job payloads, never thread‑locals only.

12) Philosophy

Isolation by default. Tenancy isn’t a UI filter; it’s enforced where the data lives.
Least privilege wins. Humans and robots get only what they need for as long as they need it.
Explain denials. If the platform can’t explain “why no,” it’s broken.
Global vs tenant plane. Public knowledge is shared; customer data is not, ever.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

38 KiB Raw Blame History Unescape Escape

Epic 14: Authority‑Backed Scopes & Tenancy

1) What it is

2) Why (brief)

3) How it should work (maximum detail)

3.1 Tokens, claims, and scopes

3.2 Selecting the active tenant

3.3 Request pipeline

3.4 Database isolation

3.5 Object storage and artifacts

3.6 Message bus topics

3.7 Background workers

3.8 Policy overlay (optional but recommended)

3.9 Service accounts & delegation

3.10 Auditing

3.11 CLI and Console UX

3.12 Compatibility modes

4) Architecture

4.1 New/updated modules

4.2 Data model changes

5) APIs and contracts

5.1 Standard headers

5.2 Auth endpoints

5.3 Tenant admin endpoints

5.4 Route protection conventions

6) Documentation changes

7) Implementation plan

Middleware & libraries

Services

Console

CLI

Storage

Migrations

8) Engineering tasks

9) Feature changes required

10) Acceptance criteria

11) Risks & mitigations

12) Philosophy

38 KiB

Raw Blame History