wip: doctor/cli/docs/api to vector db consolidation; api hardening for descriptions, tenant, and scopes; migrations and conversions of all DALs to EF v10

This commit is contained in:
master
2026-02-23 15:30:50 +02:00
parent bd8fee6ed8
commit e746577380
1424 changed files with 81225 additions and 25251 deletions

View File

@@ -1,19 +1,25 @@
# component_architecture_authority.md **StellaOps Authority** (2025Q4)
# component_architecture_authority.md — **Stella Ops Authority** (2025Q4)
> **Current tenant-selection ADR:** `docs/architecture/decisions/ADR-002-multi-tenant-same-api-key-selection.md`
> **Service impact ledger:** `docs/technical/architecture/multi-tenant-service-impact-ledger.md`
> **Flow sequences:** `docs/technical/architecture/multi-tenant-flow-sequences.md`
> **Rollout policy:** `docs/operations/multi-tenant-rollout-and-compatibility.md`
> **QA matrix:** `docs/qa/feature-checks/multi-tenant-acceptance-matrix.md`
> Consolidates identity and tenancy requirements documented across the AOC, Policy, and Platform guides, along with the dedicated Authority implementation plan.
> **Scope.** Implementationready architecture for **StellaOps Authority**: the onprem **OIDC/OAuth2** service that issues **shortlived, senderconstrained operational tokens (OpToks)** to firstparty services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for *who* is calling inside a StellaOps installation. (Entitlement is proven separately by **PoE** from the cloud Licensing Service; Authority does not issue PoE.)
> **Scope.** Implementation‑ready architecture for **Stella Ops Authority**: the on‑prem **OIDC/OAuth2** service that issues **short‑lived, sender‑constrained operational tokens (OpToks)** to first‑party services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for *who* is calling inside a Stella Ops installation. (Entitlement is proven separately by **PoE** from the cloud Licensing Service; Authority does not issue PoE.)
---
## 0) Mission & boundaries
**Mission.** Provide **fast, local, verifiable** authentication for StellaOps microservices and tools by minting **very shortlived** OAuth2/OIDC tokens that are **senderconstrained** (DPoP or mTLSbound). Support RBAC scopes, multitenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).
**Mission.** Provide **fast, local, verifiable** authentication for Stella Ops microservices and tools by minting **very short‑lived** OAuth2/OIDC tokens that are **sender‑constrained** (DPoP or mTLS‑bound). Support RBAC scopes, multi‑tenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).
**Boundaries.**
* Authority **does not** validate entitlements/licensing. Thats enforced by **Signer** using **PoE** with the cloud Licensing Service.
* Authority tokens are **operational only** (25min TTL) and must not be embedded in longlived artifacts or stored in SBOMs.
* Authority **does not** validate entitlements/licensing. That’s enforced by **Signer** using **PoE** with the cloud Licensing Service.
* Authority tokens are **operational only** (2–5 min TTL) and must not be embedded in long‑lived artifacts or stored in SBOMs.
* Authority is **stateless for validation** (JWT) and **optional introspection** for services that prefer online checks.
---
@@ -23,16 +29,16 @@
* **OIDC Discovery**: `/.well-known/openid-configuration`
* **OAuth2** grant types:
* **Client Credentials** (serviceservice, with mTLS or private_key_jwt)
* **Client Credentials** (service↔service, with mTLS or private_key_jwt)
* **Device Code** (CLI login on headless agents; optional)
* **Authorization Code + PKCE** (browser login for UI; optional)
* **Sender constraint options** (choose per caller or per audience):
* **DPoP** (Demonstration of ProofofPossession): proof JWT on each HTTP request, bound to the access token via `cnf.jkt`.
* **OAuth 2.0 mTLS** (certificatebound tokens): token bound to client certificate thumbprint via `cnf.x5t#S256`.
* **Signing algorithms**: **EdDSA (Ed25519)** preferred; fallback **ES256 (P256)**. Rotation is supported via **kid** in JWKS.
* **DPoP** (Demonstration of Proofâ€ofâ€Possession): proof JWT on each HTTP request, bound to the access token via `cnf.jkt`.
* **OAuth 2.0 mTLS** (certificate‑bound tokens): token bound to client certificate thumbprint via `cnf.x5t#S256`.
* **Signing algorithms**: **EdDSA (Ed25519)** preferred; fallback **ES256 (P‑256)**. Rotation is supported via **kid** in JWKS.
* **Token format**: **JWT** access tokens (compact), optionally opaque reference tokens for services that insist on introspection.
* **Clock skew tolerance**: ±60s; issue `nbf`, `iat`, `exp` accordingly.
* **Clock skew tolerance**: ±60 s; issue `nbf`, `iat`, `exp` accordingly.
---
@@ -41,7 +47,7 @@
* **Incident mode tokens** require the `obs:incident` scope, a human-supplied `incident_reason`, and remain valid only while `auth_time` stays within a five-minute freshness window. Resource servers enforce the same window and persist `incident.reason`, `incident.auth_time`, and the fresh-auth verdict in `authority.resource.authorize` events. Authority exposes `/authority/audit/incident` so auditors can review recent activations.
### 2.1 Access token (OpTok) shortlived (120300s)
### 2.1 Access token (OpTok) — short‑lived (120–300 s)
**Registered claims**
@@ -56,7 +62,7 @@ jti = <uuid>
scope = "scanner.scan scanner.export signer.sign ..."
```
**Senderconstraint (`cnf`)**
**Sender‑constraint (`cnf`)**
* **DPoP**:
@@ -78,11 +84,11 @@ roles = [ "svc.scanner", "svc.signer", "ui.admin", ... ]
plan? = <plan name> // optional hint for UIs; not used for enforcement
```
> **Note**: Do **not** copy PoE claims into OpTok; OpTok entitlement. Only **Signer** checks PoE.
> **Note**: Do **not** copy PoE claims into OpTok; OpTok ≠ entitlement. Only **Signer** checks PoE.
### 2.2 Refresh tokens (optional)
* Default **disabled**. If enabled (for UI interactive logins), pair with **DPoPbound** refresh tokens or **mTLS** client sessions; short TTL (≤ 8h), rotating on use (replaysafe).
* Default **disabled**. If enabled (for UI interactive logins), pair with **DPoP‑bound** refresh tokens or **mTLS** client sessions; short TTL (≤ 8 h), rotating on use (replay‑safe).
### 2.3 ID tokens (optional)
@@ -94,8 +100,8 @@ plan? = <plan name> // optional hint for UIs; not used for e
### 3.1 OIDC discovery & keys
* `GET /.well-known/openid-configuration` endpoints, algs, jwks_uri
* `GET /jwks` JSON Web Key Set (rotating, at least 2 active keys during transition)
* `GET /.well-known/openid-configuration` → endpoints, algs, jwks_uri
* `GET /jwks` → JSON Web Key Set (rotating, at least 2 active keys during transition)
> **KMS-backed keys.** When the signing provider is `kms`, Authority fetches only the public coordinates (`Qx`, `Qy`) and version identifiers from the backing KMS. Private scalars never leave the provider; JWKS entries are produced by re-exporting the public material via the `kms.version` metadata attached to each key. Retired keys keep the same `kms.version` metadata so audits can trace which cloud KMS version produced a token.
@@ -105,12 +111,12 @@ plan? = <plan name> // optional hint for UIs; not used for e
> Legacy aliases under `/oauth/token` are deprecated as of 1 November 2025 and now emit `Deprecation/Sunset/Warning` headers. See [`docs/api/authority-legacy-auth-endpoints.md`](../../api/authority-legacy-auth-endpoints.md) for timelines and migration guidance.
* **Client Credentials** (serviceservice):
* **Client Credentials** (service→service):
* **mTLS**: mutual TLS + `client_id` bound token (`cnf.x5t#S256`)
* **mTLS**: mutual TLS + `client_id` → bound token (`cnf.x5t#S256`)
* `security.senderConstraints.mtls.enforceForAudiences` forces the mTLS path when requested `aud`/`resource` values intersect high-value audiences (defaults include `signer`). Authority rejects clients attempting to use DPoP/basic secrets for these audiences.
* Stored `certificateBindings` are authoritative: thumbprint, subject, issuer, serial number, and SAN values are matched against the presented certificate, with rotation grace applied to activation windows. Failures surface deterministic error codes (e.g. `certificate_binding_subject_mismatch`).
* **private_key_jwt**: JWTbased client auth + **DPoP** header (preferred for tools and CLI)
* **private_key_jwt**: JWT‑based client auth + **DPoP** header (preferred for tools and CLI)
* **Device Code** (CLI): `POST /oauth/device/code` + `POST /oauth/token` poll
* **Authorization Code + PKCE** (UI): standard
@@ -128,7 +134,7 @@ plan? = <plan name> // optional hint for UIs; not used for e
signed with the DPoP private key; header carries JWK.
3. Authority validates proof; issues access token with `cnf.jkt=<thumbprint(JWK)>`.
4. Client uses the same DPoP key to sign **every subsequent API request** to services (Signer, Scanner, ).
4. Client uses the same DPoP key to sign **every subsequent API request** to services (Signer, Scanner, …).
**mTLS flow**
@@ -136,11 +142,11 @@ plan? = <plan name> // optional hint for UIs; not used for e
### 3.3 Introspection & revocation (optional)
* `POST /introspect` `{ active, sub, scope, aud, exp, cnf, ... }`
* `POST /revoke` revokes refresh tokens or opaque access tokens.
* `POST /introspect` → `{ active, sub, scope, aud, exp, cnf, ... }`
* `POST /revoke` → revokes refresh tokens or opaque access tokens.
> Requests targeting the legacy `/oauth/{introspect|revoke}` paths receive deprecation headers and are scheduled for removal after 1 May 2026.
* **Replay prevention**: maintain **DPoP `jti` cache** (TTL 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for highvalue operations).
* **Replay prevention**: maintain **DPoP `jti` cache** (TTL ≤ 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for high‑value operations).
### 3.4 UserInfo (optional for UI)
@@ -150,19 +156,19 @@ plan? = <plan name> // optional hint for UIs; not used for e
### 3.5 Vuln Explorer workflow safeguards
* **Anti-forgery flow** Vuln Explorers mutation verbs call
* **Anti-forgery flow** — Vuln Explorer’s mutation verbs call
* `POST /vuln/workflow/anti-forgery/issue`
* `POST /vuln/workflow/anti-forgery/verify`
Callers must hold `vuln:operate` scopes. Issued tokens embed the actor, tenant, whitelisted actions, ABAC selectors (environment/owner/business tier), and optional context key/value pairs. Tokens are EdDSA/ES256 signed via the primary Authority signing key and default to a 10minute TTL (cap: 30minutes). Verification enforces nonce reuse prevention, tenant match, and action membership before forwarding the request to Vuln Explorer.
Callers must hold `vuln:operate` scopes. Issued tokens embed the actor, tenant, whitelisted actions, ABAC selectors (environment/owner/business tier), and optional context key/value pairs. Tokens are EdDSA/ES256 signed via the primary Authority signing key and default to a 10‑minute TTL (cap: 30 minutes). Verification enforces nonce reuse prevention, tenant match, and action membership before forwarding the request to Vuln Explorer.
* **Attachment access** Evidence bundles and attachments reference a ledger hash. Vuln Explorer obtains a scoped download token through:
* **Attachment access** — Evidence bundles and attachments reference a ledger hash. Vuln Explorer obtains a scoped download token through:
* `POST /vuln/attachments/tokens/issue`
* `POST /vuln/attachments/tokens/verify`
These tokens bind the ledger event hash, attachment identifier, optional finding/content metadata, and the actor. They default to a 30minute TTL (cap: 4hours) and require `vuln:investigate`.
These tokens bind the ledger event hash, attachment identifier, optional finding/content metadata, and the actor. They default to a 30‑minute TTL (cap: 4 hours) and require `vuln:investigate`.
* **Audit trail** Both flows emit `vuln.workflow.csrf.*` and `vuln.attachment.token.*` audit records with tenant, actor, ledger hash, nonce, and filtered context metadata so Offline Kit operators can reconcile actions against ledger entries.
* **Audit trail** — Both flows emit `vuln.workflow.csrf.*` and `vuln.attachment.token.*` audit records with tenant, actor, ledger hash, nonce, and filtered context metadata so Offline Kit operators can reconcile actions against ledger entries.
* **Configuration**
@@ -194,7 +200,7 @@ plan? = <plan name> // optional hint for UIs; not used for e
### 4.1 Audiences
* `signer` only the **Signer** service should accept tokens with `aud=signer`.
* `signer` — only the **Signer** service should accept tokens with `aud=signer`.
* `attestor`, `scanner`, `concelier`, `excititor`, `ui`, `zastava` similarly.
Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their policy.
@@ -221,13 +227,13 @@ Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their p
| `authority:branding.read` / `authority:branding.write` | Authority | Branding admin |
| `zastava.emit` / `zastava.enforce` | Scanner/Zastava | Runtime events / admission |
**Roles scopes mapping** is configured centrally (Authority policy) and pushed during token issuance.
**Roles → scopes mapping** is configured centrally (Authority policy) and pushed during token issuance.
---
## 5) Storage & state
* **Configuration DB** (PostgreSQL/MySQL): clients, audiences, rolescope maps, tenant/installation registry, device code grants, persistent consents (if any).
* **Configuration DB** (PostgreSQL/MySQL): clients, audiences, role→scope maps, tenant/installation registry, device code grants, persistent consents (if any).
* **Cache** (Valkey):
* DPoP **jti** replay cache (short TTL)
@@ -241,20 +247,20 @@ Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their p
* Maintain **at least 2 signing keys** active during rotation; tokens carry `kid`.
* Prefer **Ed25519** for compact tokens; maintain **ES256** fallback for FIPS contexts.
* Rotation cadence: 3090 days; emergency rotation supported.
* Publish new JWKS **before** issuing tokens with the new `kid` to avoid coldstart validation misses.
* Rotation cadence: 30–90 days; emergency rotation supported.
* Publish new JWKS **before** issuing tokens with the new `kid` to avoid cold‑start validation misses.
* Keep **old keys** available **at least** for max token TTL + 5 minutes.
---
## 7) HA & performance
* **Stateless issuance** (except device codes/refresh) scale horizontally behind a loadbalancer.
* **DB** only for client metadata and optional flows; token checks are JWTlocal; introspection endpoints hit cache/DB minimally.
* **Stateless issuance** (except device codes/refresh) → scale horizontally behind a load‑balancer.
* **DB** only for client metadata and optional flows; token checks are JWT‑local; introspection endpoints hit cache/DB minimally.
* **Targets**:
* Token issuance P95 **20ms** under warm cache.
* DPoP proof validation **1ms** extra per request at resource servers (Signer/Scanner).
* Token issuance P95 ≤ **20 ms** under warm cache.
* DPoP proof validation ≤ **1 ms** extra per request at resource servers (Signer/Scanner).
* 99.9% uptime; HPA on CPU/latency.
---
@@ -263,7 +269,7 @@ Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their p
* **Strict TLS** (1.3 preferred); HSTS; modern cipher suites.
* **mTLS** enabled where required (Signer/Attestor paths).
* **Replay protection**: DPoP `jti` cache, nonce support for **Signer** (add `DPoP-Nonce` header on 401; clients resign).
* **Replay protection**: DPoP `jti` cache, nonce support for **Signer** (add `DPoP-Nonce` header on 401; clients re‑sign).
* **Rate limits** per client & per IP; exponential backoff on failures.
* **Secrets**: clients use **private_key_jwt** or **mTLS**; never basic secrets over the wire.
* **CSP/CSRF** hardening on UI flows; `SameSite=Lax` cookies; PKCE enforced.
@@ -271,10 +277,10 @@ Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their p
---
## 9) Multitenancy & installations
## 9) Multi‑tenancy & installations
* **Tenant (`tid`)** and **Installation (`inst`)** registries define which audiences/scopes a client can request.
* Crosstenant isolation enforced at issuance (disallow rogue `aud`), and resource servers **must** check that `tid` matches their configured tenant.
* Cross‑tenant isolation enforced at issuance (disallow rogue `aud`), and resource servers **must** check that `tid` matches their configured tenant.
---
@@ -287,7 +293,7 @@ Authority exposes two admin tiers:
```
POST /admin/clients # create/update client (confidential/public)
POST /admin/audiences # register audience resource URIs
POST /admin/roles # define rolescope mappings
POST /admin/roles # define role→scope mappings
POST /admin/tenants # create tenant/install entries
POST /admin/keys/rotate # rotate signing key (zero-downtime)
GET /admin/metrics # Prometheus exposition (token issue rates, errors)
@@ -300,10 +306,10 @@ Declared client `audiences` flow through to the issued JWT `aud` claim and the t
## 11) Integration hard lines (what resource servers must enforce)
Every StellaOps service that consumes Authority tokens **must**:
Every Stella Ops service that consumes Authority tokens **must**:
1. Verify JWT signature (`kid` in JWKS), `iss`, `aud`, `exp`, `nbf`.
2. Enforce **senderconstraint**:
2. Enforce **sender‑constraint**:
* **DPoP**: validate DPoP proof (`htu`, `htm`, `iat`, `jti`) and match `cnf.jkt`; cache `jti` for replay defense; honor nonce challenges.
* **mTLS**: match presented client cert thumbprint to token `cnf.x5t#S256`.
@@ -316,8 +322,8 @@ Every StellaOps service that consumes Authority tokens **must**:
## 12) Error surfaces & UX
* Token endpoint errors follow OAuth2 (`invalid_client`, `invalid_grant`, `invalid_scope`, `unauthorized_client`).
* Resource servers use RFC6750 style (`WWW-Authenticate: DPoP error="invalid_token", error_description="", dpop_nonce="" `).
* For DPoP nonce challenges, clients retry with the serversupplied nonce once.
* Resource servers use RFC 6750 style (`WWW-Authenticate: DPoP error="invalid_token", error_description="…", dpop_nonce="…" `).
* For DPoP nonce challenges, clients retry with the server‑supplied nonce once.
---
@@ -425,8 +431,8 @@ authority:
* **JWT validation**: wrong `aud`, expired `exp`, skewed `nbf`, stale `kid`.
* **DPoP**: invalid `htu`/`htm`, replayed `jti`, stale `iat`, wrong `jkt`, nonce dance.
* **mTLS**: wrong client cert, wrong CA, thumbprint mismatch.
* **RBAC**: scope enforcement per audience; overprivileged client denied.
* **Rotation**: JWKS rotation while loadtesting; zerodowntime verification.
* **RBAC**: scope enforcement per audience; over‑privileged client denied.
* **Rotation**: JWKS rotation while load‑testing; zero‑downtime verification.
* **HA**: kill one Authority instance; verify issuance continues; JWKS served by peers.
* **Performance**: 1k token issuance/sec on 2 cores with Valkey enabled for jti caching.
@@ -436,18 +442,18 @@ authority:
| Threat | Vector | Mitigation |
| ------------------- | ---------------- | ------------------------------------------------------------------------------------------ |
| Token theft | Copy of JWT | **Short TTL**, **senderconstraint** (DPoP/mTLS); replay blocked by `jti` cache and nonces |
| Token theft | Copy of JWT | **Short TTL**, **sender‑constraint** (DPoP/mTLS); replay blocked by `jti` cache and nonces |
| Replay across hosts | Reuse DPoP proof | Enforce `htu`/`htm`, `iat` freshness, `jti` uniqueness; services may require **nonce** |
| Impersonation | Fake client | mTLS or `private_key_jwt` with pinned JWK; client registration & rotation |
| Key compromise | Signing key leak | HSM/KMS storage, key rotation, audit; emergency key revoke path; narrow token TTL |
| Crosstenant abuse | Scope elevation | Enforce `aud`, `tid`, `inst` at issuance and resource servers |
| Cross‑tenant abuse | Scope elevation | Enforce `aud`, `tid`, `inst` at issuance and resource servers |
| Downgrade to bearer | Strip DPoP | Resource servers require DPoP/mTLS based on `aud`; reject bearer without `cnf` |
---
## 17) Deployment & HA
* **Stateless** microservice, containerized; run 2 replicas behind LB.
* **Stateless** microservice, containerized; run ≥ 2 replicas behind LB.
* **DB**: HA Postgres (or MySQL) for clients/roles; **Valkey** for device codes, DPoP nonces/jtis.
* **Secrets**: mount client JWKs via K8s Secrets/HashiCorp Vault; signing keys via KMS.
* **Backups**: DB daily; Valkey not critical (ephemeral).
@@ -464,7 +470,7 @@ authority:
---
## 19) Quick reference wire examples
## 19) Quick reference — wire examples
**Access token (payload excerpt)**
@@ -501,7 +507,7 @@ Signer validates that `hash(JWK)` in the proof matches `cnf.jkt` in the token.
## 20) Rollout plan
1. **MVP**: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, peraudience scopes.
2. **Add**: mTLSbound tokens for Signer/Attestor; device code for CLI; optional introspection.
1. **MVP**: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, per‑audience scopes.
2. **Add**: mTLS‑bound tokens for Signer/Attestor; device code for CLI; optional introspection.
3. **Hardening**: DPoP nonce support; full audit pipeline; HA tuning.
4. **UX**: Tenant/installation admin UI; rolescope editors; client bootstrap wizards.
4. **UX**: Tenant/installation admin UI; role→scope editors; client bootstrap wizards.