Files
git.stella-ops.org/docs/ARCHITECTURE_AUTHORITY.md
Vladimir Moushkov f4d7a15a00
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Add RustFS artifact object store and migration tool
- Implemented RustFsArtifactObjectStore for managing artifacts in RustFS.
- Added unit tests for RustFsArtifactObjectStore functionality.
- Created a RustFS migrator tool to transfer objects from S3 to RustFS.
- Introduced policy preview and report models for API integration.
- Added fixtures and tests for policy preview and report functionality.
- Included necessary metadata and scripts for cache_pkg package.
2025-10-23 18:53:18 +03:00

444 lines
18 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# component_architecture_authority.md — **StellaOps Authority** (2025Q4)
> **Scope.** Implementationready architecture for **StellaOps Authority**: the onprem **OIDC/OAuth2** service that issues **shortlived, senderconstrained operational tokens (OpToks)** to firstparty services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for *who* is calling inside a StellaOps installation. (Entitlement is proven separately by **PoE** from the cloud Licensing Service; Authority does not issue PoE.)
---
## 0) Mission & boundaries
**Mission.** Provide **fast, local, verifiable** authentication for StellaOps microservices and tools by minting **very shortlived** OAuth2/OIDC tokens that are **senderconstrained** (DPoP or mTLSbound). Support RBAC scopes, multitenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).
**Boundaries.**
* Authority **does not** validate entitlements/licensing. Thats enforced by **Signer** using **PoE** with the cloud Licensing Service.
* Authority tokens are **operational only** (25min TTL) and must not be embedded in longlived artifacts or stored in SBOMs.
* Authority is **stateless for validation** (JWT) and **optional introspection** for services that prefer online checks.
---
## 1) Protocols & cryptography
* **OIDC Discovery**: `/.well-known/openid-configuration`
* **OAuth2** grant types:
* **Client Credentials** (service↔service, with mTLS or private_key_jwt)
* **Device Code** (CLI login on headless agents; optional)
* **Authorization Code + PKCE** (browser login for UI; optional)
* **Sender constraint options** (choose per caller or per audience):
* **DPoP** (Demonstration of ProofofPossession): proof JWT on each HTTP request, bound to the access token via `cnf.jkt`.
* **OAuth 2.0 mTLS** (certificatebound tokens): token bound to client certificate thumbprint via `cnf.x5t#S256`.
* **Signing algorithms**: **EdDSA (Ed25519)** preferred; fallback **ES256 (P256)**. Rotation is supported via **kid** in JWKS.
* **Token format**: **JWT** access tokens (compact), optionally opaque reference tokens for services that insist on introspection.
* **Clock skew tolerance**: ±60s; issue `nbf`, `iat`, `exp` accordingly.
---
## 2) Token model
### 2.1 Access token (OpTok) — shortlived (120300s)
**Registered claims**
```
iss = https://authority.<domain>
sub = <client_id or user_id>
aud = <service audience: signer|scanner|attestor|concelier|excititor|ui|zastava>
exp = <unix ts> (<= 300 s from iat)
iat = <unix ts>
nbf = iat - 30
jti = <uuid>
scope = "scanner.scan scanner.export signer.sign ..."
```
**Senderconstraint (`cnf`)**
* **DPoP**:
```json
"cnf": { "jkt": "<base64url(SHA-256(JWK))>" }
```
* **mTLS**:
```json
"cnf": { "x5t#S256": "<base64url(SHA-256(client_cert_der))>" }
```
**Install/tenant context (custom claims)**
```
tid = <tenant id> // multi-tenant
inst = <installation id> // unique installation
roles = [ "svc.scanner", "svc.signer", "ui.admin", ... ]
plan? = <plan name> // optional hint for UIs; not used for enforcement
```
> **Note**: Do **not** copy PoE claims into OpTok; OpTok ≠ entitlement. Only **Signer** checks PoE.
### 2.2 Refresh tokens (optional)
* Default **disabled**. If enabled (for UI interactive logins), pair with **DPoPbound** refresh tokens or **mTLS** client sessions; short TTL (≤ 8h), rotating on use (replaysafe).
### 2.3 ID tokens (optional)
* Issued for UI/browser OIDC flows (Authorization Code + PKCE); not used for service auth.
---
## 3) Endpoints & flows
### 3.1 OIDC discovery & keys
* `GET /.well-known/openid-configuration` → endpoints, algs, jwks_uri
* `GET /jwks` → JSON Web Key Set (rotating, at least 2 active keys during transition)
### 3.2 Token issuance
* `POST /oauth/token`
* **Client Credentials** (service→service):
* **mTLS**: mutual TLS + `client_id` → bound token (`cnf.x5t#S256`)
* `security.senderConstraints.mtls.enforceForAudiences` forces the mTLS path when requested `aud`/`resource` values intersect high-value audiences (defaults include `signer`). Authority rejects clients attempting to use DPoP/basic secrets for these audiences.
* Stored `certificateBindings` are authoritative: thumbprint, subject, issuer, serial number, and SAN values are matched against the presented certificate, with rotation grace applied to activation windows. Failures surface deterministic error codes (e.g. `certificate_binding_subject_mismatch`).
* **private_key_jwt**: JWTbased client auth + **DPoP** header (preferred for tools and CLI)
* **Device Code** (CLI): `POST /oauth/device/code` + `POST /oauth/token` poll
* **Authorization Code + PKCE** (UI): standard
**DPoP handshake (example)**
1. Client prepares **JWK** (ephemeral keypair).
2. Client sends **DPoP proof** header with fields:
```
htm=POST
htu=https://authority.../oauth/token
iat=<now>
jti=<uuid>
```
signed with the DPoP private key; header carries JWK.
3. Authority validates proof; issues access token with `cnf.jkt=<thumbprint(JWK)>`.
4. Client uses the same DPoP key to sign **every subsequent API request** to services (Signer, Scanner, …).
**mTLS flow**
* Mutual TLS at the connection; Authority extracts client cert, validates chain; token carries `cnf.x5t#S256`.
### 3.3 Introspection & revocation (optional)
* `POST /oauth/introspect` → `{ active, sub, scope, aud, exp, cnf, ... }`
* `POST /oauth/revoke` → revokes refresh tokens or opaque access tokens.
* **Replay prevention**: maintain **DPoP `jti` cache** (TTL ≤ 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for highvalue operations).
### 3.4 UserInfo (optional for UI)
* `GET /userinfo` (ID token context).
---
## 4) Audiences, scopes & RBAC
### 4.1 Audiences
* `signer` — only the **Signer** service should accept tokens with `aud=signer`.
* `attestor`, `scanner`, `concelier`, `excititor`, `ui`, `zastava` similarly.
Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their policy.
### 4.2 Core scopes
| Scope | Service | Operation |
| ---------------------------------- | ------------------ | -------------------------- |
| `signer.sign` | Signer | Request DSSE signing |
| `attestor.write` | Attestor | Submit Rekor entries |
| `scanner.scan` | Scanner.WebService | Submit scan jobs |
| `scanner.export` | Scanner.WebService | Export SBOMs |
| `scanner.read` | Scanner.WebService | Read catalog/SBOMs |
| `vex.read` / `vex.admin` | Excititor | Query/operate |
| `concelier.read` / `concelier.export` | Concelier | Query/exports |
| `ui.read` / `ui.admin` | UI | View/admin |
| `zastava.emit` / `zastava.enforce` | Scanner/Zastava | Runtime events / admission |
**Roles → scopes mapping** is configured centrally (Authority policy) and pushed during token issuance.
---
## 5) Storage & state
* **Configuration DB** (PostgreSQL/MySQL): clients, audiences, role→scope maps, tenant/installation registry, device code grants, persistent consents (if any).
* **Cache** (Redis):
* DPoP **jti** replay cache (short TTL)
* **Nonce** store (per resource server, if they demand nonce)
* Device code pollers, rate limiting buckets
* **JWKS**: key material in HSM/KMS or encrypted at rest; JWKS served from memory.
---
## 6) Key management & rotation
* Maintain **at least 2 signing keys** active during rotation; tokens carry `kid`.
* Prefer **Ed25519** for compact tokens; maintain **ES256** fallback for FIPS contexts.
* Rotation cadence: 3090 days; emergency rotation supported.
* Publish new JWKS **before** issuing tokens with the new `kid` to avoid coldstart validation misses.
* Keep **old keys** available **at least** for max token TTL + 5 minutes.
---
## 7) HA & performance
* **Stateless issuance** (except device codes/refresh) → scale horizontally behind a loadbalancer.
* **DB** only for client metadata and optional flows; token checks are JWTlocal; introspection endpoints hit cache/DB minimally.
* **Targets**:
* Token issuance P95 ≤ **20ms** under warm cache.
* DPoP proof validation ≤ **1ms** extra per request at resource servers (Signer/Scanner).
* 99.9% uptime; HPA on CPU/latency.
---
## 8) Security posture
* **Strict TLS** (1.3 preferred); HSTS; modern cipher suites.
* **mTLS** enabled where required (Signer/Attestor paths).
* **Replay protection**: DPoP `jti` cache, nonce support for **Signer** (add `DPoP-Nonce` header on 401; clients resign).
* **Rate limits** per client & per IP; exponential backoff on failures.
* **Secrets**: clients use **private_key_jwt** or **mTLS**; never basic secrets over the wire.
* **CSP/CSRF** hardening on UI flows; `SameSite=Lax` cookies; PKCE enforced.
* **Logs** redact `Authorization` and DPoP proofs; store `sub`, `aud`, `scopes`, `inst`, `tid`, `cnf` thumbprints, not full keys.
---
## 9) Multitenancy & installations
* **Tenant (`tid`)** and **Installation (`inst`)** registries define which audiences/scopes a client can request.
* Crosstenant isolation enforced at issuance (disallow rogue `aud`), and resource servers **must** check that `tid` matches their configured tenant.
---
## 10) Admin & operations APIs
All under `/admin` (mTLS + `authority.admin` scope).
```
POST /admin/clients # create/update client (confidential/public)
POST /admin/audiences # register audience resource URIs
POST /admin/roles # define role→scope mappings
POST /admin/tenants # create tenant/install entries
POST /admin/keys/rotate # rotate signing key (zero-downtime)
GET /admin/metrics # Prometheus exposition (token issue rates, errors)
GET /admin/healthz|readyz # health/readiness
```
Declared client `audiences` flow through to the issued JWT `aud` claim and the token request's `resource` indicators. Authority relies on this metadata to enforce DPoP nonce challenges for `signer`, `attestor`, and other high-value services without requiring clients to repeat the audience parameter on every request.
---
## 11) Integration hard lines (what resource servers must enforce)
Every StellaOps service that consumes Authority tokens **must**:
1. Verify JWT signature (`kid` in JWKS), `iss`, `aud`, `exp`, `nbf`.
2. Enforce **senderconstraint**:
* **DPoP**: validate DPoP proof (`htu`, `htm`, `iat`, `jti`) and match `cnf.jkt`; cache `jti` for replay defense; honor nonce challenges.
* **mTLS**: match presented client cert thumbprint to token `cnf.x5t#S256`.
3. Check **scopes**; optionally map to internal roles.
4. Check **tenant** (`tid`) and **installation** (`inst`) as appropriate.
5. For **Signer** only: require **both** OpTok and **PoE** in the request (enforced by Signer, not Authority).
---
## 12) Error surfaces & UX
* Token endpoint errors follow OAuth2 (`invalid_client`, `invalid_grant`, `invalid_scope`, `unauthorized_client`).
* Resource servers use RFC6750 style (`WWW-Authenticate: DPoP error="invalid_token", error_description="…", dpop_nonce="…" `).
* For DPoP nonce challenges, clients retry with the serversupplied nonce once.
---
## 13) Observability & audit
* **Metrics**:
* `authority.tokens_issued_total{grant,aud}`
* `authority.dpop_validations_total{result}`
* `authority.mtls_bindings_total{result}`
* `authority.jwks_rotations_total`
* `authority.errors_total{type}`
* **Audit log** (immutable sink): token issuance (`sub`, `aud`, `scopes`, `tid`, `inst`, `cnf thumbprint`, `jti`), revocations, admin changes.
* **Tracing**: token flows, DB reads, JWKS cache.
---
## 14) Configuration (YAML)
```yaml
authority:
issuer: "https://authority.internal"
signing:
enabled: true
activeKeyId: "authority-signing-2025"
keyPath: "../certificates/authority-signing-2025.pem"
algorithm: "ES256"
keySource: "file"
security:
rateLimiting:
token:
enabled: true
permitLimit: 30
window: "00:01:00"
queueLimit: 0
authorize:
enabled: true
permitLimit: 60
window: "00:01:00"
queueLimit: 10
internal:
enabled: false
permitLimit: 5
window: "00:01:00"
queueLimit: 0
senderConstraints:
dpop:
enabled: true
allowedAlgorithms: [ "ES256", "ES384" ]
proofLifetime: "00:02:00"
allowedClockSkew: "00:00:30"
replayWindow: "00:05:00"
nonce:
enabled: true
ttl: "00:10:00"
maxIssuancePerMinute: 120
store: "redis"
redisConnectionString: "redis://authority-redis:6379?ssl=false"
requiredAudiences:
- "signer"
- "attestor"
mtls:
enabled: true
requireChainValidation: true
rotationGrace: "00:15:00"
enforceForAudiences:
- "signer"
allowedSanTypes:
- "dns"
- "uri"
allowedCertificateAuthorities:
- "/etc/ssl/mtls/clients-ca.pem"
clients:
- clientId: scanner-web
grantTypes: [ "client_credentials" ]
audiences: [ "scanner" ]
auth: { type: "private_key_jwt", jwkFile: "/secrets/scanner-web.jwk" }
senderConstraint: "dpop"
scopes: [ "scanner.scan", "scanner.export", "scanner.read" ]
- clientId: signer
grantTypes: [ "client_credentials" ]
audiences: [ "signer" ]
auth: { type: "mtls" }
senderConstraint: "mtls"
scopes: [ "signer.sign" ]
- clientId: notify-web-dev
grantTypes: [ "client_credentials" ]
audiences: [ "notify.dev" ]
auth: { type: "client_secret", secretFile: "/secrets/notify-web-dev.secret" }
senderConstraint: "dpop"
scopes: [ "notify.read", "notify.admin" ]
- clientId: notify-web
grantTypes: [ "client_credentials" ]
audiences: [ "notify" ]
auth: { type: "client_secret", secretFile: "/secrets/notify-web.secret" }
senderConstraint: "dpop"
scopes: [ "notify.read", "notify.admin" ]
```
---
## 15) Testing matrix
* **JWT validation**: wrong `aud`, expired `exp`, skewed `nbf`, stale `kid`.
* **DPoP**: invalid `htu`/`htm`, replayed `jti`, stale `iat`, wrong `jkt`, nonce dance.
* **mTLS**: wrong client cert, wrong CA, thumbprint mismatch.
* **RBAC**: scope enforcement per audience; overprivileged client denied.
* **Rotation**: JWKS rotation while loadtesting; zerodowntime verification.
* **HA**: kill one Authority instance; verify issuance continues; JWKS served by peers.
* **Performance**: 1k token issuance/sec on 2 cores with Redis enabled for jti caching.
---
## 16) Threat model & mitigations (summary)
| Threat | Vector | Mitigation |
| ------------------- | ---------------- | ------------------------------------------------------------------------------------------ |
| Token theft | Copy of JWT | **Short TTL**, **senderconstraint** (DPoP/mTLS); replay blocked by `jti` cache and nonces |
| Replay across hosts | Reuse DPoP proof | Enforce `htu`/`htm`, `iat` freshness, `jti` uniqueness; services may require **nonce** |
| Impersonation | Fake client | mTLS or `private_key_jwt` with pinned JWK; client registration & rotation |
| Key compromise | Signing key leak | HSM/KMS storage, key rotation, audit; emergency key revoke path; narrow token TTL |
| Crosstenant abuse | Scope elevation | Enforce `aud`, `tid`, `inst` at issuance and resource servers |
| Downgrade to bearer | Strip DPoP | Resource servers require DPoP/mTLS based on `aud`; reject bearer without `cnf` |
---
## 17) Deployment & HA
* **Stateless** microservice, containerized; run ≥ 2 replicas behind LB.
* **DB**: HA Postgres (or MySQL) for clients/roles; **Redis** for device codes, DPoP nonces/jtis.
* **Secrets**: mount client JWKs via K8s Secrets/HashiCorp Vault; signing keys via KMS.
* **Backups**: DB daily; Redis not critical (ephemeral).
* **Disaster recovery**: export/import of client registry; JWKS rehydrate from KMS.
* **Compliance**: TLS audit; penetration testing for OIDC flows.
---
## 18) Implementation notes
* Reference stack: **.NET 10** + **OpenIddict 6** (or IdentityServer if licensed) with custom DPoP validator and mTLS binding middleware.
* Keep the DPoP/JTI cache pluggable; allow Redis/Memcached.
* Provide **client SDKs** for C# and Go: DPoP key mgmt, proof generation, nonce handling, token refresh helper.
---
## 19) Quick reference — wire examples
**Access token (payload excerpt)**
```json
{
"iss": "https://authority.internal",
"sub": "scanner-web",
"aud": "signer",
"exp": 1760668800,
"iat": 1760668620,
"nbf": 1760668620,
"jti": "9d9c3f01-6e1a-49f1-8f77-9b7e6f7e3c50",
"scope": "signer.sign",
"tid": "tenant-01",
"inst": "install-7A2B",
"cnf": { "jkt": "KcVb2V...base64url..." }
}
```
**DPoP proof header fields (for POST /sign/dsse)**
```json
{
"htu": "https://signer.internal/sign/dsse",
"htm": "POST",
"iat": 1760668620,
"jti": "4b1c9b3c-8a95-4c58-8a92-9c6cfb4a6a0b"
}
```
Signer validates that `hash(JWK)` in the proof matches `cnf.jkt` in the token.
---
## 20) Rollout plan
1. **MVP**: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, peraudience scopes.
2. **Add**: mTLSbound tokens for Signer/Attestor; device code for CLI; optional introspection.
3. **Hardening**: DPoP nonce support; full audit pipeline; HA tuning.
4. **UX**: Tenant/installation admin UI; role→scope editors; client bootstrap wizards.