Files
git.stella-ops.org/docs/modules/authority/architecture.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

18 KiB
Raw Blame History

component_architecture_authority.md — StellaOps Authority (2025Q4)

Consolidates identity and tenancy requirements documented across the AOC, Policy, and Platform guides, along with the dedicated Authority implementation plan.

Scope. Implementationready architecture for StellaOps Authority: the onprem OIDC/OAuth2 service that issues shortlived, senderconstrained operational tokens (OpToks) to firstparty services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for who is calling inside a StellaOps installation. (Entitlement is proven separately by PoE from the cloud Licensing Service; Authority does not issue PoE.)


0) Mission & boundaries

Mission. Provide fast, local, verifiable authentication for StellaOps microservices and tools by minting very shortlived OAuth2/OIDC tokens that are senderconstrained (DPoP or mTLSbound). Support RBAC scopes, multitenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava).

Boundaries.

  • Authority does not validate entitlements/licensing. Thats enforced by Signer using PoE with the cloud Licensing Service.
  • Authority tokens are operational only (25min TTL) and must not be embedded in longlived artifacts or stored in SBOMs.
  • Authority is stateless for validation (JWT) and optional introspection for services that prefer online checks.

1) Protocols & cryptography

  • OIDC Discovery: /.well-known/openid-configuration

  • OAuth2 grant types:

    • Client Credentials (service↔service, with mTLS or private_key_jwt)
    • Device Code (CLI login on headless agents; optional)
    • Authorization Code + PKCE (browser login for UI; optional)
  • Sender constraint options (choose per caller or per audience):

    • DPoP (Demonstration of ProofofPossession): proof JWT on each HTTP request, bound to the access token via cnf.jkt.
    • OAuth 2.0 mTLS (certificatebound tokens): token bound to client certificate thumbprint via cnf.x5t#S256.
  • Signing algorithms: EdDSA (Ed25519) preferred; fallback ES256 (P256). Rotation is supported via kid in JWKS.

  • Token format: JWT access tokens (compact), optionally opaque reference tokens for services that insist on introspection.

  • Clock skew tolerance: ±60s; issue nbf, iat, exp accordingly.


2) Token model

2.1 Access token (OpTok) — shortlived (120300s)

Registered claims

iss   = https://authority.<domain>
sub   = <client_id or user_id>
aud   = <service audience: signer|scanner|attestor|concelier|excititor|ui|zastava>
exp   = <unix ts>  (<= 300 s from iat)
iat   = <unix ts>
nbf   = iat - 30
jti   = <uuid>
scope = "scanner.scan scanner.export signer.sign ..."

Senderconstraint (cnf)

  • DPoP:

    "cnf": { "jkt": "<base64url(SHA-256(JWK))>" }
    
  • mTLS:

    "cnf": { "x5t#S256": "<base64url(SHA-256(client_cert_der))>" }
    

Install/tenant context (custom claims)

tid          = <tenant id>               // multi-tenant
inst         = <installation id>        // unique installation
roles        = [ "svc.scanner", "svc.signer", "ui.admin", ... ]
plan?        = <plan name>              // optional hint for UIs; not used for enforcement

Note

: Do not copy PoE claims into OpTok; OpTok ≠ entitlement. Only Signer checks PoE.

2.2 Refresh tokens (optional)

  • Default disabled. If enabled (for UI interactive logins), pair with DPoPbound refresh tokens or mTLS client sessions; short TTL (≤ 8h), rotating on use (replaysafe).

2.3 ID tokens (optional)

  • Issued for UI/browser OIDC flows (Authorization Code + PKCE); not used for service auth.

3) Endpoints & flows

3.1 OIDC discovery & keys

  • GET /.well-known/openid-configuration → endpoints, algs, jwks_uri
  • GET /jwks → JSON Web Key Set (rotating, at least 2 active keys during transition)

3.2 Token issuance

  • POST /oauth/token

    • Client Credentials (service→service):

      • mTLS: mutual TLS + client_id → bound token (cnf.x5t#S256)
        • security.senderConstraints.mtls.enforceForAudiences forces the mTLS path when requested aud/resource values intersect high-value audiences (defaults include signer). Authority rejects clients attempting to use DPoP/basic secrets for these audiences.
        • Stored certificateBindings are authoritative: thumbprint, subject, issuer, serial number, and SAN values are matched against the presented certificate, with rotation grace applied to activation windows. Failures surface deterministic error codes (e.g. certificate_binding_subject_mismatch).
      • private_key_jwt: JWTbased client auth + DPoP header (preferred for tools and CLI)
    • Device Code (CLI): POST /oauth/device/code + POST /oauth/token poll

    • Authorization Code + PKCE (UI): standard

DPoP handshake (example)

  1. Client prepares JWK (ephemeral keypair).

  2. Client sends DPoP proof header with fields:

    htm=POST
    htu=https://authority.../oauth/token
    iat=<now>
    jti=<uuid>
    

    signed with the DPoP private key; header carries JWK.

  3. Authority validates proof; issues access token with cnf.jkt=<thumbprint(JWK)>.

  4. Client uses the same DPoP key to sign every subsequent API request to services (Signer, Scanner, …).

mTLS flow

  • Mutual TLS at the connection; Authority extracts client cert, validates chain; token carries cnf.x5t#S256.

3.3 Introspection & revocation (optional)

  • POST /oauth/introspect{ active, sub, scope, aud, exp, cnf, ... }
  • POST /oauth/revoke → revokes refresh tokens or opaque access tokens.
  • Replay prevention: maintain DPoP jti cache (TTL ≤ 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for highvalue operations).

3.4 UserInfo (optional for UI)

  • GET /userinfo (ID token context).

4) Audiences, scopes & RBAC

4.1 Audiences

  • signer — only the Signer service should accept tokens with aud=signer.
  • attestor, scanner, concelier, excititor, ui, zastava similarly.

Services must verify aud and sender constraint (DPoP/mTLS) per their policy.

4.2 Core scopes

Scope Service Operation
signer.sign Signer Request DSSE signing
attestor.write Attestor Submit Rekor entries
scanner.scan Scanner.WebService Submit scan jobs
scanner.export Scanner.WebService Export SBOMs
scanner.read Scanner.WebService Read catalog/SBOMs
vex.read / vex.admin Excititor Query/operate
concelier.read / concelier.export Concelier Query/exports
ui.read / ui.admin UI View/admin
zastava.emit / zastava.enforce Scanner/Zastava Runtime events / admission

Roles → scopes mapping is configured centrally (Authority policy) and pushed during token issuance.


5) Storage & state

  • Configuration DB (PostgreSQL/MySQL): clients, audiences, role→scope maps, tenant/installation registry, device code grants, persistent consents (if any).

  • Cache (Redis):

    • DPoP jti replay cache (short TTL)
    • Nonce store (per resource server, if they demand nonce)
    • Device code pollers, rate limiting buckets
  • JWKS: key material in HSM/KMS or encrypted at rest; JWKS served from memory.


6) Key management & rotation

  • Maintain at least 2 signing keys active during rotation; tokens carry kid.
  • Prefer Ed25519 for compact tokens; maintain ES256 fallback for FIPS contexts.
  • Rotation cadence: 3090 days; emergency rotation supported.
  • Publish new JWKS before issuing tokens with the new kid to avoid coldstart validation misses.
  • Keep old keys available at least for max token TTL + 5 minutes.

7) HA & performance

  • Stateless issuance (except device codes/refresh) → scale horizontally behind a loadbalancer.

  • DB only for client metadata and optional flows; token checks are JWTlocal; introspection endpoints hit cache/DB minimally.

  • Targets:

    • Token issuance P95 ≤ 20ms under warm cache.
    • DPoP proof validation ≤ 1ms extra per request at resource servers (Signer/Scanner).
    • 99.9% uptime; HPA on CPU/latency.

8) Security posture

  • Strict TLS (1.3 preferred); HSTS; modern cipher suites.
  • mTLS enabled where required (Signer/Attestor paths).
  • Replay protection: DPoP jti cache, nonce support for Signer (add DPoP-Nonce header on 401; clients resign).
  • Rate limits per client & per IP; exponential backoff on failures.
  • Secrets: clients use private_key_jwt or mTLS; never basic secrets over the wire.
  • CSP/CSRF hardening on UI flows; SameSite=Lax cookies; PKCE enforced.
  • Logs redact Authorization and DPoP proofs; store sub, aud, scopes, inst, tid, cnf thumbprints, not full keys.

9) Multitenancy & installations

  • Tenant (tid) and Installation (inst) registries define which audiences/scopes a client can request.
  • Crosstenant isolation enforced at issuance (disallow rogue aud), and resource servers must check that tid matches their configured tenant.

10) Admin & operations APIs

All under /admin (mTLS + authority.admin scope).

POST /admin/clients                 # create/update client (confidential/public)
POST /admin/audiences               # register audience resource URIs
POST /admin/roles                   # define role→scope mappings
POST /admin/tenants                 # create tenant/install entries
POST /admin/keys/rotate             # rotate signing key (zero-downtime)
GET  /admin/metrics                 # Prometheus exposition (token issue rates, errors)
GET  /admin/healthz|readyz          # health/readiness

Declared client audiences flow through to the issued JWT aud claim and the token request's resource indicators. Authority relies on this metadata to enforce DPoP nonce challenges for signer, attestor, and other high-value services without requiring clients to repeat the audience parameter on every request.


11) Integration hard lines (what resource servers must enforce)

Every StellaOps service that consumes Authority tokens must:

  1. Verify JWT signature (kid in JWKS), iss, aud, exp, nbf.

  2. Enforce senderconstraint:

    • DPoP: validate DPoP proof (htu, htm, iat, jti) and match cnf.jkt; cache jti for replay defense; honor nonce challenges.
    • mTLS: match presented client cert thumbprint to token cnf.x5t#S256.
  3. Check scopes; optionally map to internal roles.

  4. Check tenant (tid) and installation (inst) as appropriate.

  5. For Signer only: require both OpTok and PoE in the request (enforced by Signer, not Authority).


12) Error surfaces & UX

  • Token endpoint errors follow OAuth2 (invalid_client, invalid_grant, invalid_scope, unauthorized_client).
  • Resource servers use RFC6750 style (WWW-Authenticate: DPoP error="invalid_token", error_description="…", dpop_nonce="…" ).
  • For DPoP nonce challenges, clients retry with the serversupplied nonce once.

13) Observability & audit

  • Metrics:

    • authority.tokens_issued_total{grant,aud}
    • authority.dpop_validations_total{result}
    • authority.mtls_bindings_total{result}
    • authority.jwks_rotations_total
    • authority.errors_total{type}
  • Audit log (immutable sink): token issuance (sub, aud, scopes, tid, inst, cnf thumbprint, jti), revocations, admin changes.

  • Tracing: token flows, DB reads, JWKS cache.


14) Configuration (YAML)

authority:
  issuer: "https://authority.internal"
  signing:
    enabled: true
    activeKeyId: "authority-signing-2025"
    keyPath: "../certificates/authority-signing-2025.pem"
    algorithm: "ES256"
    keySource: "file"
  security:
    rateLimiting:
      token:
        enabled: true
        permitLimit: 30
        window: "00:01:00"
        queueLimit: 0
      authorize:
        enabled: true
        permitLimit: 60
        window: "00:01:00"
        queueLimit: 10
      internal:
        enabled: false
        permitLimit: 5
        window: "00:01:00"
        queueLimit: 0
    senderConstraints:
      dpop:
        enabled: true
        allowedAlgorithms: [ "ES256", "ES384" ]
        proofLifetime: "00:02:00"
        allowedClockSkew: "00:00:30"
        replayWindow: "00:05:00"
        nonce:
          enabled: true
          ttl: "00:10:00"
          maxIssuancePerMinute: 120
          store: "redis"
          redisConnectionString: "redis://authority-redis:6379?ssl=false"
          requiredAudiences:
            - "signer"
            - "attestor"
      mtls:
        enabled: true
        requireChainValidation: true
        rotationGrace: "00:15:00"
        enforceForAudiences:
          - "signer"
        allowedSanTypes:
          - "dns"
          - "uri"
        allowedCertificateAuthorities:
          - "/etc/ssl/mtls/clients-ca.pem"
  clients:
    - clientId: scanner-web
      grantTypes: [ "client_credentials" ]
      audiences: [ "scanner" ]
      auth: { type: "private_key_jwt", jwkFile: "/secrets/scanner-web.jwk" }
      senderConstraint: "dpop"
      scopes: [ "scanner.scan", "scanner.export", "scanner.read" ]
    - clientId: signer
      grantTypes: [ "client_credentials" ]
      audiences: [ "signer" ]
      auth: { type: "mtls" }
      senderConstraint: "mtls"
      scopes: [ "signer.sign" ]
    - clientId: notify-web-dev
      grantTypes: [ "client_credentials" ]
      audiences: [ "notify.dev" ]
      auth: { type: "client_secret", secretFile: "/secrets/notify-web-dev.secret" }
      senderConstraint: "dpop"
      scopes: [ "notify.read", "notify.admin" ]
    - clientId: notify-web
      grantTypes: [ "client_credentials" ]
      audiences: [ "notify" ]
      auth: { type: "client_secret", secretFile: "/secrets/notify-web.secret" }
      senderConstraint: "dpop"
      scopes: [ "notify.read", "notify.admin" ]

15) Testing matrix

  • JWT validation: wrong aud, expired exp, skewed nbf, stale kid.
  • DPoP: invalid htu/htm, replayed jti, stale iat, wrong jkt, nonce dance.
  • mTLS: wrong client cert, wrong CA, thumbprint mismatch.
  • RBAC: scope enforcement per audience; overprivileged client denied.
  • Rotation: JWKS rotation while loadtesting; zerodowntime verification.
  • HA: kill one Authority instance; verify issuance continues; JWKS served by peers.
  • Performance: 1k token issuance/sec on 2 cores with Redis enabled for jti caching.

16) Threat model & mitigations (summary)

Threat Vector Mitigation
Token theft Copy of JWT Short TTL, senderconstraint (DPoP/mTLS); replay blocked by jti cache and nonces
Replay across hosts Reuse DPoP proof Enforce htu/htm, iat freshness, jti uniqueness; services may require nonce
Impersonation Fake client mTLS or private_key_jwt with pinned JWK; client registration & rotation
Key compromise Signing key leak HSM/KMS storage, key rotation, audit; emergency key revoke path; narrow token TTL
Crosstenant abuse Scope elevation Enforce aud, tid, inst at issuance and resource servers
Downgrade to bearer Strip DPoP Resource servers require DPoP/mTLS based on aud; reject bearer without cnf

17) Deployment & HA

  • Stateless microservice, containerized; run ≥ 2 replicas behind LB.
  • DB: HA Postgres (or MySQL) for clients/roles; Redis for device codes, DPoP nonces/jtis.
  • Secrets: mount client JWKs via K8s Secrets/HashiCorp Vault; signing keys via KMS.
  • Backups: DB daily; Redis not critical (ephemeral).
  • Disaster recovery: export/import of client registry; JWKS rehydrate from KMS.
  • Compliance: TLS audit; penetration testing for OIDC flows.

18) Implementation notes

  • Reference stack: .NET 10 + OpenIddict 6 (or IdentityServer if licensed) with custom DPoP validator and mTLS binding middleware.
  • Keep the DPoP/JTI cache pluggable; allow Redis/Memcached.
  • Provide client SDKs for C# and Go: DPoP key mgmt, proof generation, nonce handling, token refresh helper.

19) Quick reference — wire examples

Access token (payload excerpt)

{
  "iss": "https://authority.internal",
  "sub": "scanner-web",
  "aud": "signer",
  "exp": 1760668800,
  "iat": 1760668620,
  "nbf": 1760668620,
  "jti": "9d9c3f01-6e1a-49f1-8f77-9b7e6f7e3c50",
  "scope": "signer.sign",
  "tid": "tenant-01",
  "inst": "install-7A2B",
  "cnf": { "jkt": "KcVb2V...base64url..." }
}

DPoP proof header fields (for POST /sign/dsse)

{
  "htu": "https://signer.internal/sign/dsse",
  "htm": "POST",
  "iat": 1760668620,
  "jti": "4b1c9b3c-8a95-4c58-8a92-9c6cfb4a6a0b"
}

Signer validates that hash(JWK) in the proof matches cnf.jkt in the token.


20) Rollout plan

  1. MVP: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, peraudience scopes.
  2. Add: mTLSbound tokens for Signer/Attestor; device code for CLI; optional introspection.
  3. Hardening: DPoP nonce support; full audit pipeline; HA tuning.
  4. UX: Tenant/installation admin UI; role→scope editors; client bootstrap wizards.