- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
18 KiB
component_architecture_authority.md — Stella Ops Authority (2025Q4)
Consolidates identity and tenancy requirements documented across the AOC, Policy, and Platform guides, along with the dedicated Authority implementation plan.
Scope. Implementation‑ready architecture for Stella Ops Authority: the on‑prem OIDC/OAuth2 service that issues short‑lived, sender‑constrained operational tokens (OpToks) to first‑party services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for who is calling inside a Stella Ops installation. (Entitlement is proven separately by PoE from the cloud Licensing Service; Authority does not issue PoE.)
0) Mission & boundaries
Mission. Provide fast, local, verifiable authentication for Stella Ops microservices and tools by minting very short‑lived OAuth2/OIDC tokens that are sender‑constrained (DPoP or mTLS‑bound). Support RBAC scopes, multi‑tenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).
Boundaries.
- Authority does not validate entitlements/licensing. That’s enforced by Signer using PoE with the cloud Licensing Service.
- Authority tokens are operational only (2–5 min TTL) and must not be embedded in long‑lived artifacts or stored in SBOMs.
- Authority is stateless for validation (JWT) and optional introspection for services that prefer online checks.
1) Protocols & cryptography
-
OIDC Discovery:
/.well-known/openid-configuration -
OAuth2 grant types:
- Client Credentials (service↔service, with mTLS or private_key_jwt)
- Device Code (CLI login on headless agents; optional)
- Authorization Code + PKCE (browser login for UI; optional)
-
Sender constraint options (choose per caller or per audience):
- DPoP (Demonstration of Proof‑of‑Possession): proof JWT on each HTTP request, bound to the access token via
cnf.jkt. - OAuth 2.0 mTLS (certificate‑bound tokens): token bound to client certificate thumbprint via
cnf.x5t#S256.
- DPoP (Demonstration of Proof‑of‑Possession): proof JWT on each HTTP request, bound to the access token via
-
Signing algorithms: EdDSA (Ed25519) preferred; fallback ES256 (P‑256). Rotation is supported via kid in JWKS.
-
Token format: JWT access tokens (compact), optionally opaque reference tokens for services that insist on introspection.
-
Clock skew tolerance: ±60 s; issue
nbf,iat,expaccordingly.
2) Token model
2.1 Access token (OpTok) — short‑lived (120–300 s)
Registered claims
iss = https://authority.<domain>
sub = <client_id or user_id>
aud = <service audience: signer|scanner|attestor|concelier|excititor|ui|zastava>
exp = <unix ts> (<= 300 s from iat)
iat = <unix ts>
nbf = iat - 30
jti = <uuid>
scope = "scanner.scan scanner.export signer.sign ..."
Sender‑constraint (cnf)
-
DPoP:
"cnf": { "jkt": "<base64url(SHA-256(JWK))>" } -
mTLS:
"cnf": { "x5t#S256": "<base64url(SHA-256(client_cert_der))>" }
Install/tenant context (custom claims)
tid = <tenant id> // multi-tenant
inst = <installation id> // unique installation
roles = [ "svc.scanner", "svc.signer", "ui.admin", ... ]
plan? = <plan name> // optional hint for UIs; not used for enforcement
Note
: Do not copy PoE claims into OpTok; OpTok ≠ entitlement. Only Signer checks PoE.
2.2 Refresh tokens (optional)
- Default disabled. If enabled (for UI interactive logins), pair with DPoP‑bound refresh tokens or mTLS client sessions; short TTL (≤ 8 h), rotating on use (replay‑safe).
2.3 ID tokens (optional)
- Issued for UI/browser OIDC flows (Authorization Code + PKCE); not used for service auth.
3) Endpoints & flows
3.1 OIDC discovery & keys
GET /.well-known/openid-configuration→ endpoints, algs, jwks_uriGET /jwks→ JSON Web Key Set (rotating, at least 2 active keys during transition)
3.2 Token issuance
-
POST /oauth/token-
Client Credentials (service→service):
- mTLS: mutual TLS +
client_id→ bound token (cnf.x5t#S256)security.senderConstraints.mtls.enforceForAudiencesforces the mTLS path when requestedaud/resourcevalues intersect high-value audiences (defaults includesigner). Authority rejects clients attempting to use DPoP/basic secrets for these audiences.- Stored
certificateBindingsare authoritative: thumbprint, subject, issuer, serial number, and SAN values are matched against the presented certificate, with rotation grace applied to activation windows. Failures surface deterministic error codes (e.g.certificate_binding_subject_mismatch).
- private_key_jwt: JWT‑based client auth + DPoP header (preferred for tools and CLI)
- mTLS: mutual TLS +
-
Device Code (CLI):
POST /oauth/device/code+POST /oauth/tokenpoll -
Authorization Code + PKCE (UI): standard
-
DPoP handshake (example)
-
Client prepares JWK (ephemeral keypair).
-
Client sends DPoP proof header with fields:
htm=POST htu=https://authority.../oauth/token iat=<now> jti=<uuid>signed with the DPoP private key; header carries JWK.
-
Authority validates proof; issues access token with
cnf.jkt=<thumbprint(JWK)>. -
Client uses the same DPoP key to sign every subsequent API request to services (Signer, Scanner, …).
mTLS flow
- Mutual TLS at the connection; Authority extracts client cert, validates chain; token carries
cnf.x5t#S256.
3.3 Introspection & revocation (optional)
POST /oauth/introspect→{ active, sub, scope, aud, exp, cnf, ... }POST /oauth/revoke→ revokes refresh tokens or opaque access tokens.- Replay prevention: maintain DPoP
jticache (TTL ≤ 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for high‑value operations).
3.4 UserInfo (optional for UI)
GET /userinfo(ID token context).
4) Audiences, scopes & RBAC
4.1 Audiences
signer— only the Signer service should accept tokens withaud=signer.attestor,scanner,concelier,excititor,ui,zastavasimilarly.
Services must verify aud and sender constraint (DPoP/mTLS) per their policy.
4.2 Core scopes
| Scope | Service | Operation |
|---|---|---|
signer.sign |
Signer | Request DSSE signing |
attestor.write |
Attestor | Submit Rekor entries |
scanner.scan |
Scanner.WebService | Submit scan jobs |
scanner.export |
Scanner.WebService | Export SBOMs |
scanner.read |
Scanner.WebService | Read catalog/SBOMs |
vex.read / vex.admin |
Excititor | Query/operate |
concelier.read / concelier.export |
Concelier | Query/exports |
ui.read / ui.admin |
UI | View/admin |
zastava.emit / zastava.enforce |
Scanner/Zastava | Runtime events / admission |
Roles → scopes mapping is configured centrally (Authority policy) and pushed during token issuance.
5) Storage & state
-
Configuration DB (PostgreSQL/MySQL): clients, audiences, role→scope maps, tenant/installation registry, device code grants, persistent consents (if any).
-
Cache (Redis):
- DPoP jti replay cache (short TTL)
- Nonce store (per resource server, if they demand nonce)
- Device code pollers, rate limiting buckets
-
JWKS: key material in HSM/KMS or encrypted at rest; JWKS served from memory.
6) Key management & rotation
- Maintain at least 2 signing keys active during rotation; tokens carry
kid. - Prefer Ed25519 for compact tokens; maintain ES256 fallback for FIPS contexts.
- Rotation cadence: 30–90 days; emergency rotation supported.
- Publish new JWKS before issuing tokens with the new
kidto avoid cold‑start validation misses. - Keep old keys available at least for max token TTL + 5 minutes.
7) HA & performance
-
Stateless issuance (except device codes/refresh) → scale horizontally behind a load‑balancer.
-
DB only for client metadata and optional flows; token checks are JWT‑local; introspection endpoints hit cache/DB minimally.
-
Targets:
- Token issuance P95 ≤ 20 ms under warm cache.
- DPoP proof validation ≤ 1 ms extra per request at resource servers (Signer/Scanner).
- 99.9% uptime; HPA on CPU/latency.
8) Security posture
- Strict TLS (1.3 preferred); HSTS; modern cipher suites.
- mTLS enabled where required (Signer/Attestor paths).
- Replay protection: DPoP
jticache, nonce support for Signer (addDPoP-Nonceheader on 401; clients re‑sign). - Rate limits per client & per IP; exponential backoff on failures.
- Secrets: clients use private_key_jwt or mTLS; never basic secrets over the wire.
- CSP/CSRF hardening on UI flows;
SameSite=Laxcookies; PKCE enforced. - Logs redact
Authorizationand DPoP proofs; storesub,aud,scopes,inst,tid,cnfthumbprints, not full keys.
9) Multi‑tenancy & installations
- Tenant (
tid) and Installation (inst) registries define which audiences/scopes a client can request. - Cross‑tenant isolation enforced at issuance (disallow rogue
aud), and resource servers must check thattidmatches their configured tenant.
10) Admin & operations APIs
All under /admin (mTLS + authority.admin scope).
POST /admin/clients # create/update client (confidential/public)
POST /admin/audiences # register audience resource URIs
POST /admin/roles # define role→scope mappings
POST /admin/tenants # create tenant/install entries
POST /admin/keys/rotate # rotate signing key (zero-downtime)
GET /admin/metrics # Prometheus exposition (token issue rates, errors)
GET /admin/healthz|readyz # health/readiness
Declared client audiences flow through to the issued JWT aud claim and the token request's resource indicators. Authority relies on this metadata to enforce DPoP nonce challenges for signer, attestor, and other high-value services without requiring clients to repeat the audience parameter on every request.
11) Integration hard lines (what resource servers must enforce)
Every Stella Ops service that consumes Authority tokens must:
-
Verify JWT signature (
kidin JWKS),iss,aud,exp,nbf. -
Enforce sender‑constraint:
- DPoP: validate DPoP proof (
htu,htm,iat,jti) and matchcnf.jkt; cachejtifor replay defense; honor nonce challenges. - mTLS: match presented client cert thumbprint to token
cnf.x5t#S256.
- DPoP: validate DPoP proof (
-
Check scopes; optionally map to internal roles.
-
Check tenant (
tid) and installation (inst) as appropriate. -
For Signer only: require both OpTok and PoE in the request (enforced by Signer, not Authority).
12) Error surfaces & UX
- Token endpoint errors follow OAuth2 (
invalid_client,invalid_grant,invalid_scope,unauthorized_client). - Resource servers use RFC 6750 style (
WWW-Authenticate: DPoP error="invalid_token", error_description="…", dpop_nonce="…"). - For DPoP nonce challenges, clients retry with the server‑supplied nonce once.
13) Observability & audit
-
Metrics:
authority.tokens_issued_total{grant,aud}authority.dpop_validations_total{result}authority.mtls_bindings_total{result}authority.jwks_rotations_totalauthority.errors_total{type}
-
Audit log (immutable sink): token issuance (
sub,aud,scopes,tid,inst,cnf thumbprint,jti), revocations, admin changes. -
Tracing: token flows, DB reads, JWKS cache.
14) Configuration (YAML)
authority:
issuer: "https://authority.internal"
signing:
enabled: true
activeKeyId: "authority-signing-2025"
keyPath: "../certificates/authority-signing-2025.pem"
algorithm: "ES256"
keySource: "file"
security:
rateLimiting:
token:
enabled: true
permitLimit: 30
window: "00:01:00"
queueLimit: 0
authorize:
enabled: true
permitLimit: 60
window: "00:01:00"
queueLimit: 10
internal:
enabled: false
permitLimit: 5
window: "00:01:00"
queueLimit: 0
senderConstraints:
dpop:
enabled: true
allowedAlgorithms: [ "ES256", "ES384" ]
proofLifetime: "00:02:00"
allowedClockSkew: "00:00:30"
replayWindow: "00:05:00"
nonce:
enabled: true
ttl: "00:10:00"
maxIssuancePerMinute: 120
store: "redis"
redisConnectionString: "redis://authority-redis:6379?ssl=false"
requiredAudiences:
- "signer"
- "attestor"
mtls:
enabled: true
requireChainValidation: true
rotationGrace: "00:15:00"
enforceForAudiences:
- "signer"
allowedSanTypes:
- "dns"
- "uri"
allowedCertificateAuthorities:
- "/etc/ssl/mtls/clients-ca.pem"
clients:
- clientId: scanner-web
grantTypes: [ "client_credentials" ]
audiences: [ "scanner" ]
auth: { type: "private_key_jwt", jwkFile: "/secrets/scanner-web.jwk" }
senderConstraint: "dpop"
scopes: [ "scanner.scan", "scanner.export", "scanner.read" ]
- clientId: signer
grantTypes: [ "client_credentials" ]
audiences: [ "signer" ]
auth: { type: "mtls" }
senderConstraint: "mtls"
scopes: [ "signer.sign" ]
- clientId: notify-web-dev
grantTypes: [ "client_credentials" ]
audiences: [ "notify.dev" ]
auth: { type: "client_secret", secretFile: "/secrets/notify-web-dev.secret" }
senderConstraint: "dpop"
scopes: [ "notify.read", "notify.admin" ]
- clientId: notify-web
grantTypes: [ "client_credentials" ]
audiences: [ "notify" ]
auth: { type: "client_secret", secretFile: "/secrets/notify-web.secret" }
senderConstraint: "dpop"
scopes: [ "notify.read", "notify.admin" ]
15) Testing matrix
- JWT validation: wrong
aud, expiredexp, skewednbf, stalekid. - DPoP: invalid
htu/htm, replayedjti, staleiat, wrongjkt, nonce dance. - mTLS: wrong client cert, wrong CA, thumbprint mismatch.
- RBAC: scope enforcement per audience; over‑privileged client denied.
- Rotation: JWKS rotation while load‑testing; zero‑downtime verification.
- HA: kill one Authority instance; verify issuance continues; JWKS served by peers.
- Performance: 1k token issuance/sec on 2 cores with Redis enabled for jti caching.
16) Threat model & mitigations (summary)
| Threat | Vector | Mitigation |
|---|---|---|
| Token theft | Copy of JWT | Short TTL, sender‑constraint (DPoP/mTLS); replay blocked by jti cache and nonces |
| Replay across hosts | Reuse DPoP proof | Enforce htu/htm, iat freshness, jti uniqueness; services may require nonce |
| Impersonation | Fake client | mTLS or private_key_jwt with pinned JWK; client registration & rotation |
| Key compromise | Signing key leak | HSM/KMS storage, key rotation, audit; emergency key revoke path; narrow token TTL |
| Cross‑tenant abuse | Scope elevation | Enforce aud, tid, inst at issuance and resource servers |
| Downgrade to bearer | Strip DPoP | Resource servers require DPoP/mTLS based on aud; reject bearer without cnf |
17) Deployment & HA
- Stateless microservice, containerized; run ≥ 2 replicas behind LB.
- DB: HA Postgres (or MySQL) for clients/roles; Redis for device codes, DPoP nonces/jtis.
- Secrets: mount client JWKs via K8s Secrets/HashiCorp Vault; signing keys via KMS.
- Backups: DB daily; Redis not critical (ephemeral).
- Disaster recovery: export/import of client registry; JWKS rehydrate from KMS.
- Compliance: TLS audit; penetration testing for OIDC flows.
18) Implementation notes
- Reference stack: .NET 10 + OpenIddict 6 (or IdentityServer if licensed) with custom DPoP validator and mTLS binding middleware.
- Keep the DPoP/JTI cache pluggable; allow Redis/Memcached.
- Provide client SDKs for C# and Go: DPoP key mgmt, proof generation, nonce handling, token refresh helper.
19) Quick reference — wire examples
Access token (payload excerpt)
{
"iss": "https://authority.internal",
"sub": "scanner-web",
"aud": "signer",
"exp": 1760668800,
"iat": 1760668620,
"nbf": 1760668620,
"jti": "9d9c3f01-6e1a-49f1-8f77-9b7e6f7e3c50",
"scope": "signer.sign",
"tid": "tenant-01",
"inst": "install-7A2B",
"cnf": { "jkt": "KcVb2V...base64url..." }
}
DPoP proof header fields (for POST /sign/dsse)
{
"htu": "https://signer.internal/sign/dsse",
"htm": "POST",
"iat": 1760668620,
"jti": "4b1c9b3c-8a95-4c58-8a92-9c6cfb4a6a0b"
}
Signer validates that hash(JWK) in the proof matches cnf.jkt in the token.
20) Rollout plan
- MVP: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, per‑audience scopes.
- Add: mTLS‑bound tokens for Signer/Attestor; device code for CLI; optional introspection.
- Hardening: DPoP nonce support; full audit pipeline; HA tuning.
- UX: Tenant/installation admin UI; role→scope editors; client bootstrap wizards.