Files
git.stella-ops.org/docs/11_AUTHORITY.md
master 2eb6852d34
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests for SBOM ingestion and transformation
- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly.
- Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps.
- Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges.
- Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges.
- Set up project file for the test project with necessary dependencies and configurations.
- Include JSON fixture files for testing purposes.
2025-11-04 07:49:39 +02:00

558 lines
51 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# StellaOps Authority Service
> **Status:** Drafted 2025-10-12 (CORE5B.DOC / DOC1.AUTH) aligns with Authority revocation store, JWKS rotation, and bootstrap endpoints delivered in Sprint1.
## 1. Purpose
The **StellaOps Authority** service issues OAuth2/OIDC tokens for every StellaOps module (Concelier, Backend, Agent, Zastava) and exposes the policy controls required in sovereign/offline environments. Authority is built as a minimal ASP.NET host that:
- brokers password, client-credentials, and device-code flows through pluggable identity providers;
- persists access/refresh/device tokens in MongoDB with deterministic schemas for replay analysis and air-gapped audit copies;
- distributes revocation bundles and JWKS material so downstream services can enforce lockouts without direct database access;
- offers bootstrap APIs for first-run provisioning and key rotation without redeploying binaries.
Authority is deployed alongside Concelier in air-gapped environments and never requires outbound internet access. All trusted metadata (OpenIddict discovery, JWKS, revocation bundles) is cacheable, signed, and reproducible.
## 2. Component Architecture
Authority is composed of five cooperating subsystems:
1. **Minimal API host** configures OpenIddict endpoints (`/token`, `/authorize`, `/revoke`, `/jwks`), publishes the OpenAPI contract at `/.well-known/openapi`, and enables structured logging/telemetry. Rate limiting hooks (`AuthorityRateLimiter`) wrap every request.
2. **Plugin host** loads `StellaOps.Authority.Plugin.*.dll` assemblies, applies capability metadata, and exposes password/client provisioning surfaces through dependency injection.
3. **Mongo storage** persists tokens, revocations, bootstrap invites, and plugin state in deterministic collections indexed for offline sync (`authority_tokens`, `authority_revocations`, etc.).
4. **Cryptography layer** `StellaOps.Cryptography` abstractions manage password hashing, signing keys, JWKS export, and detached JWS generation.
5. **Offline ops APIs** internal endpoints under `/internal/*` provide administrative flows (bootstrap users/clients, revocation export) guarded by API keys and deterministic audit events.
A high-level sequence for password logins:
```
Client -> /token (password grant)
-> Rate limiter & audit hooks
-> Plugin credential store (Argon2id verification)
-> Token persistence (Mongo authority_tokens)
-> Response (access/refresh tokens + deterministic claims)
```
## 3. Token Lifecycle & Persistence
Authority persists every issued token in MongoDB so operators can audit or revoke without scanning distributed caches.
- **Collection:** `authority_tokens`
- **Key fields:**
- `tokenId`, `type` (`access_token`, `refresh_token`, `device_code`, `authorization_code`)
- `subjectId`, `clientId`, ordered `scope` array
- `tenant` (lower-cased tenant hint from the issuing client, omitted for global clients)
### Console OIDC client
- **Client ID**: `console-web`
- **Grants**: `authorization_code` (PKCE required), `refresh_token`
- **Audience**: `console`
- **Scopes**: `openid`, `profile`, `email`, `advisory:read`, `advisory-ai:view`, `vex:read`, `aoc:verify`, `findings:read`, `orch:read`, `vuln:view`, `vuln:investigate`, `vuln:operate`, `vuln:audit`
- **Redirect URIs** (defaults): `https://console.stella-ops.local/oidc/callback`
- **Post-logout redirect**: `https://console.stella-ops.local/`
- **Tokens**: Access tokens inherit the global 2 minute lifetime; refresh tokens remain short-lived (30 days) and can be exchanged silently via `/token`.
- **Roles**: Assign Authority role `Orch.Viewer` (exposed to tenants as `role/orch-viewer`) when operators need read-only access to Orchestrator telemetry via Console dashboards. Policy Studio ships dedicated roles (`role/policy-author`, `role/policy-reviewer`, `role/policy-approver`, `role/policy-operator`, `role/policy-auditor`) plus the new attestation verbs (`policy:publish`, `policy:promote`) that align with the `policy:*` scope family; issue them per tenant so audit trails remain scoped and interactive attestations stay attributable.
Configuration sample (`etc/authority.yaml.sample`) seeds the client with a confidential secret so Console can negotiate the code exchange on the backend while browsers execute the PKCE dance.
### Policy Studio scopes & signing workflow
- **Role bundles:** Issue the dedicated Policy Studio roles per tenant (`role/policy-author`, `role/policy-reviewer`, `role/policy-approver`, `role/policy-operator`, `role/policy-auditor`). Each maps to the `policy:*` scopes described in [Policy Lifecycle & Approvals](policy/lifecycle.md#2-roles--authority-scopes).
- **Publish/promote scopes:** `policy:publish` and `policy:promote` are interactive-only. Authority rejects client-credential tokens; operators must log in via `stella auth login` (DPoP) and stay within the five-minute fresh-auth window.
- **Required metadata:** Publishing attaches `policy_reason`, `policy_ticket`, and `policy_digest` headers. The CLI surface (`stella policy publish --reason --ticket --sign`) maps flags to these fields automatically. Missing metadata returns `422 policy_attestation_metadata_missing`.
- **Attestations:** `stella policy publish --sign` produces a DSSE envelope stored in Policy Engine (`policy_attestations`) and on disk for Offline Kit evidence. Promotions (`stella policy promote --environment prod`) emit `policy.promoted` audit events referencing the attestation digest.
- **Compliance checklist:** Before activation, verify each item in [§10 Compliance Checklist](policy/lifecycle.md#10--compliance-checklist) — role mapping, simulation evidence, approval note, attestation signature, promotion note, activation health, offline parity.
### Advisory AI scopes & remote inference
- `advisory-ai:view` — read Advisory AI artefacts (summaries, remediation packs, cached outputs).
- `advisory-ai:operate` — submit Advisory AI inference jobs and remediation requests.
- `advisory-ai:admin` — administer Advisory AI configuration, profile selection, and remote execution controls.
Authority publishes the trio in OpenID discovery (`stellaops_advisory_ai_scopes_supported`) so clients can self-discover capability. Remote/cloud inference is disabled by default; set `advisoryAi.remoteInference.enabled: true` and provide an explicit `allowedProfiles` whitelist (for example `cloud-openai`) when an installation opts in. The `requireTenantConsent` toggle (default `true`) enforces per-tenant opt-in before remote profiles are invoked, mirroring regulatory expectations for sovereign or air-gapped deployments.
### Console Authority endpoints
- `/console/tenants` — Requires `authority:tenants.read`; returns the tenant catalogue for the authenticated principal. Requests lacking the `X-Stella-Tenant` header are rejected (`tenant_header_missing`) and logged.
- `/console/profile` — Requires `ui.read`; exposes subject metadata (roles, scopes, audiences) and indicates whether the session is within the five-minute fresh-auth window.
- `/console/token/introspect` — Requires `ui.read`; introspects the active access token so the SPA can prompt for re-authentication before privileged actions.
All endpoints demand DPoP-bound tokens and propagate structured audit events (`authority.console.*`). Gateways must forward the `X-Stella-Tenant` header derived from the access token; downstream services rely on the same value for isolation. Keep Console access tokens short-lived (default 15minutes) and enforce the fresh-auth window for admin actions (`ui.admin`, `authority:*`, `policy:activate`, `exceptions:approve`).
- `status` (`valid`, `revoked`, `expired`), `createdAt`, optional `expiresAt`
- `revokedAt`, machine-readable `revokedReason`, optional `revokedReasonDescription`
- `revokedMetadata` (string dictionary for plugin-specific context)
- **Persistence flow:** `PersistTokensHandler` stamps missing JWT IDs, normalises scopes, and stores every principal emitted by OpenIddict.
- **Revocation flow:** `AuthorityTokenStore.UpdateStatusAsync` flips status, records the reason metadata, and is invoked by token revocation handlers and plugin provisioning events (e.g., disabling a user).
- **Expiry maintenance:** `AuthorityTokenStore.DeleteExpiredAsync` prunes non-revoked tokens past their `expiresAt` timestamp. Operators should schedule this in maintenance windows if large volumes of tokens are issued.
### Expectations for resource servers
Resource servers (Concelier WebService, Backend, Agent) **must not** assume in-memory caches are authoritative. They should:
- cache `/jwks` and `/revocations/export` responses within configured lifetimes;
- honour `revokedReason` metadata when shaping audit trails;
- treat `status != "valid"` or missing tokens as immediate denial conditions.
- propagate the `tenant` claim (`X-Stella-Tenant` header in REST calls) and reject requests when the tenant supplied by Authority does not match the resource server's scope; Concelier and Excititor guard endpoints refuse cross-tenant tokens.
### Tenant propagation
- Client provisioning (bootstrap or plug-in) accepts a `tenant` hint. Authority normalises the value (`trim().ToLowerInvariant()`) and persists it alongside the registration. Clients without an explicit tenant remain global.
- Issued principals include the `stellaops:tenant` claim. `PersistTokensHandler` mirrors this claim into `authority_tokens.tenant`, enabling per-tenant revocation and reporting.
- Rate limiter metadata now tags requests with `authority.tenant`, unlocking per-tenant throughput metrics and diagnostic filters. Audit events (`authority.client_credentials.grant`, `authority.password.grant`, bootstrap flows) surface the tenant and login attempt documents index on `{tenant, occurredAt}` for quick queries.
- Client credentials that request `advisory:ingest`, `advisory:read`, `advisory-ai:view`, `advisory-ai:operate`, `advisory-ai:admin`, `vex:ingest`, `vex:read`, `signals:read`, `signals:write`, `signals:admin`, or `aoc:verify` now fail fast when the client registration lacks a tenant hint. Issued tokens are re-validated against persisted tenant metadata, and Authority rejects any cross-tenant replay (`invalid_client`/`invalid_token`), ensuring aggregation-only workloads remain tenant-scoped.
- Client credentials that request `export.viewer`, `export.operator`, or `export.admin` must provide a tenant hint. Requests for `export.admin` also need accompanying `export_reason` and `export_ticket` parameters; Authority returns `invalid_request` when either value is missing and records the denial in token audit events.
- Client credentials that request `notify.viewer`, `notify.operator`, or `notify.admin` must provide a tenant hint. Authority records scope violations when tenancy is missing and emits `authority.notify.scope_violation` audit metadata so operators can trace denied requests.
- Policy Studio scopes (`policy:author`, `policy:review`, `policy:approve`, `policy:operate`, `policy:publish`, `policy:promote`, `policy:audit`, `policy:simulate`, `policy:run`, `policy:activate`) require a tenant assignment; Authority rejects tokens missing the hint with `invalid_client` and records `scope.invalid` metadata for auditing. The `policy:publish`/`policy:promote` scopes are interactive-only and demand additional metadata (see “Policy attestation metadata” below).
- Policy attestation tokens must include three parameters: `policy_reason` (≤512 chars describing why the attestation is being produced), `policy_ticket` (≤128 chars change/request reference), and `policy_digest` (32128 char hex digest of the policy package). Authority rejects requests missing any value, over the limits, or providing a non-hex digest. Password-grant issuance stamps these values into the resulting token/audit trail and enforces a five-minute fresh-auth window via the `auth_time` claim.
- Task Pack scopes (`packs.read`, `packs.write`, `packs.run`, `packs.approve`) require a tenant assignment; Authority rejects tokens missing the hint with `invalid_client` and logs `authority.pack_scope_violation` metadata for audit correlation.
- **AOC pairing guardrails** Tokens that request `advisory:read`, `advisory-ai:view`, `advisory-ai:operate`, `advisory-ai:admin`, `vex:read`, or any `signals:*` scope must also request `aoc:verify`. Authority rejects mismatches with `invalid_scope` (e.g., `Scope 'aoc:verify' is required when requesting advisory/advisory-ai/vex read scopes.` or `Scope 'aoc:verify' is required when requesting signals scopes.`) so automation surfaces deterministic errors.
- **Signals ingestion guardrails** Sensors and services requesting `signals:write`/`signals:admin` must also request `aoc:verify`; Authority records the `authority.aoc_scope_violation` tag when the pairing is missing so operators can trace failing sensors immediately.
- Password grant flows reuse the client registration's tenant and enforce the configured scope allow-list. Requested scopes outside that list (or mismatched tenants) trigger `invalid_scope`/`invalid_client` failures, ensuring cross-tenant access is denied before token issuance.
### Default service scopes
| Client ID | Purpose | Scopes granted | Sender constraint | Tenant |
|----------------------|---------------------------------------|--------------------------------------|-------------------|-----------------|
| `concelier-ingest` | Concelier raw advisory ingestion | `advisory:ingest`, `advisory:read` | `dpop` | `tenant-default` |
| `excitor-ingest` | Excititor raw VEX ingestion | `vex:ingest`, `vex:read` | `dpop` | `tenant-default` |
| `aoc-verifier` | Aggregation-only contract verification | `aoc:verify`, `advisory:read`, `vex:read` | `dpop` | `tenant-default` |
| `cartographer-service` | Graph snapshot construction | `graph:write`, `graph:read` | `dpop` | `tenant-default` |
| `graph-api` | Graph Explorer gateway/API | `graph:read`, `graph:export`, `graph:simulate` | `dpop` | `tenant-default` |
| `export-center-operator` | Export Center operator automation | `export.viewer`, `export.operator` | `dpop` | `tenant-default` |
| `export-center-admin` | Export Center administrative automation | `export.viewer`, `export.operator`, `export.admin` | `dpop` | `tenant-default` |
| `notify-service` | Notify WebService API | `notify.viewer`, `notify.operator` | `dpop` | `tenant-default` |
| `notify-admin` | Notify administrative automation | `notify.viewer`, `notify.operator`, `notify.admin` | `dpop` | `tenant-default` |
| `vuln-explorer-ui` | Vuln Explorer UI/API | `vuln:view`, `vuln:investigate`, `vuln:operate`, `vuln:audit` | `dpop` | `tenant-default` |
| `signals-uploader` | Reachability sensor ingestion | `signals:write`, `signals:read`, `aoc:verify` | `dpop` | `tenant-default` |
> **Secret hygiene (20251027):** The repository includes a convenience `etc/authority.yaml` for compose/helm smoke tests. Every entrys `secretFile` points to `etc/secrets/*.secret`, which ship with `*-change-me` placeholders—replace them with strong values (and wire them through your vault/secret manager) before issuing tokens in CI, staging, or production.
For factory provisioning, issue sensors the **SignalsUploader** role template (`signals:write`, `signals:read`, `aoc:verify`). Authority rejects ingestion tokens that omit `aoc:verify`, preserving aggregation-only contract guarantees for reachability signals.
These registrations are provided as examples in `etc/authority.yaml.sample`. Clone them per tenant (for example `concelier-tenant-a`, `concelier-tenant-b`) so tokens remain tenant-scoped by construction.
### Policy attestation metadata
- **Interactive only.** `policy:publish` and `policy:promote` are restricted to password/device-code flows (Console, CLI) and are rejected when requested via client credentials or app secrets. Tokens inherit the 5-minute fresh-auth window; resource servers reject stale tokens and emit `authority.policy_attestation_validated=false`.
- **Mandatory parameters.** Requests must include:
- `policy_reason` (≤512 chars) — human-readable justification (e.g., “Promote tenant A baseline to production”).
- `policy_ticket` (≤128 chars) — change request / CAB identifier (e.g., `CR-2025-1102`).
- `policy_digest` — lowercase hex digest (32128 characters) of the policy bundle being published/promoted.
- **Audit surfaces.** On success, the metadata is copied into the access token (`stellaops:policy_reason`, `stellaops:policy_ticket`, `stellaops:policy_digest`, `stellaops:policy_operation`) and recorded in [`authority.password.grant`] audit events as `policy.*` properties.
- **Failure modes.** Missing/blank parameters, over-length values, or non-hex digests trigger `invalid_request` responses and `authority.policy_attestation_denied` audit tags. CLI/Console must bubble these errors to operators and provide retry UX.
- **CLI / Console UX.** The CLI stores attestation metadata in `stella.toml` (`authority.policy.publishReason`, `authority.policy.publishTicket`) or accepts `STELLA_POLICY_REASON` / `STELLA_POLICY_TICKET` / `STELLA_POLICY_DIGEST` environment variables. Console prompts operators for the same trio before issuing attestation tokens and refuses to cache values longer than the session.
- **Automation guidance.** CI workflows should compute the policy digest ahead of time (for example `sha256sum policy-package.tgz | cut -d' ' -f1`) and inject the reason/ticket/digest into CLI environment variables immediately before invoking `stella auth login --scope policy:publish`.
Graph Explorer introduces dedicated scopes: `graph:write` for Cartographer build jobs, `graph:read` for query/read operations, `graph:export` for long-running export downloads, and `graph:simulate` for what-if overlays. Assign only the scopes a client actually needs to preserve least privilege—UI-facing clients should typically request read/export access, while background services (Cartographer, Scheduler) require write privileges.
#### Least-privilege guidance for graph clients
- **Service identities** The Cartographer worker should request `graph:write` and `graph:read` only; grant `graph:simulate` exclusively to pipeline automation that invokes Policy Engine overlays on demand. Keep `graph:export` scoped to API gateway components responsible for streaming GraphML/JSONL artifacts. Authority enforces this by rejecting `graph:write` tokens that lack `properties.serviceIdentity: cartographer`.
- **Tenant propagation** Every client registration must pin a `tenant` hint. Authority normalises the value and stamps it into issued tokens (`stellaops:tenant`) so downstream services (Scheduler, Graph API, Console) can enforce tenant isolation without custom headers. Graph scopes (`graph:read`, `graph:write`, `graph:export`, `graph:simulate`) are denied if the tenant hint is missing.
- **SDK alignment** Use the generated `StellaOpsScopes` constants in service code to request graph scopes. Hard-coded strings risk falling out of sync as additional graph capabilities are added.
- **DPOP for automation** Maintain sender-constrained (`dpop`) flows for Cartographer and Scheduler to limit reuse of access tokens if a build host is compromised. For UI-facing tokens, pair `graph:read`/`graph:export` with short lifetimes and enforce refresh-token rotation at the gateway.
#### Export Center scope guardrails
- **Viewer vs operator** `export.viewer` grants read-only access to export profiles, manifests, and bundles. Automation that schedules or reruns exports should request `export.operator` (and typically `export.viewer`). Tenant hints remain mandatory; Authority refuses tokens without them.
- **Administrative mutations** Changes to retention policies, encryption key references, or schedule defaults require `export.admin`. When requesting tokens with this scope, clients must supply `export_reason` and `export_ticket` parameters; Authority persists the values for audit records and rejects missing metadata with `invalid_request`.
- **Operational hygiene** Rotate `export.admin` credentials infrequently and run them through fresh-auth workflows where possible. Prefer distributing verification tooling with `export.viewer` tokens for day-to-day bundle validation.
#### Notify scope guardrails
- **Viewer vs operator** `notify.viewer` grants read-only access to rules, channels, and delivery history. Automation that edits rules or triggers test notifications must request `notify.operator` (and usually `notify.viewer`). Tenant hints remain mandatory.
- **Administrative controls** Changes to channel secrets, quiet hours, or escalation policies require `notify.admin`. Authority logs these operations and surfaces `authority.notify.scope_violation` when tokens omit the scope or tenant.
- **Least privilege** Assign `notify.admin` sparingly (platform operations, DR automation). Day-to-day rule editing should rely on `notify.operator` scoped per tenant.
#### Vuln Explorer scopes, ABAC, and permalinks
- **Scopes** `vuln:view` unlocks read-only access and permalink issuance, `vuln:investigate` allows triage actions (assignment, comments, remediation notes), `vuln:operate` unlocks state transitions and workflow execution, and `vuln:audit` exposes immutable ledgers/exports. The legacy `vuln:read` scope is still emitted for backward compatibility but new clients should request the granular scopes.
- **ABAC attributes** Tenant roles can project attribute filters (`env`, `owner`, `business_tier`) via the `attributes` block in `authority.yaml` (see the sample `role/vuln-*` definitions). Authority now enforces the same filters on token issuance: client-credential requests must supply `vuln_env`, `vuln_owner`, and `vuln_business_tier` parameters when multiple values are configured, and the values must match the configured allow-list (or `*`). The accepted value pattern is `[a-z0-9:_-]{1,128}`. Issued tokens embed the resolved filters as `stellaops:vuln_env`, `stellaops:vuln_owner`, and `stellaops:vuln_business_tier` claims, and Authority persists the resulting actor chain plus service-account metadata in Mongo for auditability.
- **Service accounts** Delegated Vuln Explorer identities (`svc-vuln-*`) should include the attribute filters in their seed definition. Authority enforces the supplied `attributes` during issuance and stores the selected values on the delegation token, making downstream revocation/audit exports aware of the effective ABAC envelope.
- **Attachment tokens** Evidence downloads require scoped tokens issued by Authority. `POST /vuln/attachments/tokens/issue` accepts ledger hashes plus optional metadata, signs the response with the primary Authority key, and records audit trails (`vuln.attachment.token.*`). `POST /vuln/attachments/tokens/verify` validates incoming tokens server-side. See “Attachment signing tokens” below.
- **Token request parameters** Minimum metadata for Vuln Explorer service accounts:
- `service_account`: requested service-account id (still required).
- `vuln_env`: single value or `*` (required when multiple environments are configured).
- `vuln_owner`: owning team/code; optional only when the service account allows a single value.
- `vuln_business_tier`: business criticality tier (`tier-1`, `tier-2`, etc.).
Authority rejects missing parameters with `invalid_request` and records the violation via `authority.vuln_attr_*` audit properties.
- **Signed links** `POST /permalinks/vuln` requires `vuln:view`. The resulting JWT carries `vuln:view` plus the transitional `vuln:read` scope to preserve older consumers. Validation remains unchanged: verify the signature against `/jwks`, confirm tenant alignment, honour expiry, and enforce the scopes before honouring the permalink payload.
##### Attachment signing tokens
- **Issuance.** `POST /vuln/attachments/tokens/issue` (scope `vuln:investigate`) accepts an attachment identifier, the authoritative ledger hash, and optional metadata map (`metadata[]`). Authority returns a DSSE-style payload signed with the primary EdDSA key, capped by the configured TTL (`authority.vulnerabilityExplorer.attachments.defaultLifetime`, default 30 minutes).
- **Verification.** Downstream services call `POST /vuln/attachments/tokens/verify` before honouring downloads. The endpoint enforces tenant, scope, ABAC attributes, TTL, and ledger hash matching, and emits `vuln.attachment.token.verify` audit events with the resolved metadata.
- **Audit trail.** Every issuance logs `vuln.attachment.token.issue` with `delegation.service_account`, `ledger.hash`, and `attachment.id` properties so Offline Kit operators can reconcile evidence access. Tokens also embed the actor chain (`act`) so consuming services can trace automation pipelines.
- **Example.**
```bash
curl -u vuln-explorer-worker:s3cr3t \
-H "Content-Type: application/json" \
-d '{
"attachmentId": "finding-7d9d/evidence-2",
"ledgerHash": "sha256:4a5160...",
"metadata": { "download": "supporting-log.zip" }
}' \
https://authority.example.com/vuln/attachments/tokens/issue
```
##### Ledger verification workflow
1. Resolve the attachments ledger entry (`finding_history`, `triage_actions`) and note the recorded hash/signature.
2. Verify the issued attachment token via `/vuln/attachments/tokens/verify`; the response echoes the canonical hash and expiry.
3. When downloading artefacts from Vuln Explorer, recompute the hash locally and compare it to both the ledger entry and the verified token payload.
4. Cross-check Authority audit events (`vuln.attachment.token.*`) to confirm who issued and consumed the token; Offline Kit mirrors include the same audit feed.
##### Vuln Explorer security checklist
- [ ] Map tenant roles to the granular `vuln:*` scopes and ABAC filters in `etc/authority.yaml.sample`.
- [ ] Require `vuln_env`, `vuln_owner`, and `vuln_business_tier` parameters for every delegated service-account request.
- [ ] Exercise `/vuln/attachments/tokens/issue` and `/vuln/attachments/tokens/verify` in CI to confirm attachment signing is enforced.
- [ ] Mirror Authority audit events (`vuln.attachment.token.*`, `authority.vuln_attr.*`) into your SOC pipeline.
- [ ] Update Offline Kit runbooks so operators verify attachment hashes against both ledger entries and Authority-issued tokens before distribution.
## 4. Revocation Pipeline
Authority centralises revocation in `authority_revocations` with deterministic categories:
| Category | Meaning | Required fields |
| --- | --- | --- |
| `token` | Specific OAuth token revoked early. | `revocationId` (token id), `tokenType`, optional `clientId`, `subjectId` |
| `subject` | All tokens for a subject disabled. | `revocationId` (= subject id) |
| `client` | OAuth client registration revoked. | `revocationId` (= client id) |
| `key` | Signing/JWE key withdrawn. | `revocationId` (= key id) |
`RevocationBundleBuilder` flattens Mongo documents into canonical JSON, sorts entries by (`category`, `revocationId`, `revokedAt`), and signs exports using detached JWS (RFC7797) with cosign-compatible headers.
**Export surfaces** (deterministic output, suitable for Offline Kit):
- CLI: `stella auth revoke export --output ./out` writes `revocation-bundle.json`, `.jws`, `.sha256`.
- Verification: `stella auth revoke verify --bundle <path> --signature <path> --key <path>` validates detached JWS signatures before distribution, selecting the crypto provider advertised in the detached header (see `docs/security/revocation-bundle.md`).
- API: `GET /internal/revocations/export` (requires bootstrap API key) returns the same payload.
- Verification: `stella auth revoke verify` validates schema, digest, and detached JWS using cached JWKS or offline keys, automatically preferring the hinted provider (libsodium builds honour `provider=libsodium`; other builds fall back to the managed provider).
**Consumer guidance:**
1. Mirror `revocation-bundle.json*` alongside Concelier exports. Offline agents fetch both over the existing update channel.
2. Use bundle `sequence` and `bundleId` to detect replay or monotonicity regressions. Ignore bundles with older sequence numbers unless `bundleId` changes and `issuedAt` advances.
3. Treat `revokedReason` taxonomy as machine-friendly codes (`compromised`, `rotation`, `policy`, `lifecycle`). Translating to human-readable logs is the consumers responsibility.
## 5. Signing Keys & JWKS Rotation
Authority signs revocation bundles and publishes JWKS entries via the new signing manager:
- **Configuration (`authority.yaml`):**
```yaml
signing:
enabled: true
algorithm: ES256 # Defaults to ES256
keySource: file # Loader identifier (file, vault, etc.)
provider: default # Optional preferred crypto provider
activeKeyId: authority-signing-dev
keyPath: "../certificates/authority-signing-dev.pem"
additionalKeys:
- keyId: authority-signing-dev-2024
path: "../certificates/authority-signing-dev-2024.pem"
source: "file"
```
- **Sources:** The default loader supports PEM files relative to the content root; additional loaders can be registered via `IAuthoritySigningKeySource`.
- **Providers:** Keys are registered against the `ICryptoProviderRegistry`, so alternative implementations (HSM, libsodium) can be plugged in without changing host code.
- **OpenAPI discovery:** `GET /.well-known/openapi` returns the published authentication contract (JSON by default, YAML when requested). Responses include `X-StellaOps-Service`, `X-StellaOps-Api-Version`, `X-StellaOps-Build-Version`, plus grant and scope headers, and honour conditional requests via `ETag`/`If-None-Match`.
- **JWKS output:** `GET /jwks` lists every signing key with `status` metadata (`active`, `retired`). Old keys remain until operators remove them from configuration, allowing verification of historical bundles/tokens.
### Rotation SOP (no downtime)
1. Generate a new P-256 private key (PEM) on an offline workstation and place it where the Authority host can read it (e.g., `../certificates/authority-signing-2025.pem`).
2. Call the authenticated admin API:
```bash
curl -sS -X POST https://authority.example.com/internal/signing/rotate \
-H "x-stellaops-bootstrap-key: ${BOOTSTRAP_KEY}" \
-H "Content-Type: application/json" \
-d '{
"keyId": "authority-signing-2025",
"location": "../certificates/authority-signing-2025.pem",
"source": "file"
}'
```
3. Verify the response reports the previous key as retired and fetch `/jwks` to confirm the new `kid` appears with `status: "active"`.
4. Persist the old key path in `signing.additionalKeys` (the rotation API updates in-memory options; rewrite the YAML to match so restarts remain consistent).
5. If you prefer automation, trigger the `.gitea/workflows/authority-key-rotation.yml` workflow with the new `keyId`/`keyPath`; it wraps `ops/authority/key-rotation.sh` and reads environment-specific secrets. The older key will be marked `retired` and appended to `signing.additionalKeys`.
6. Re-run `stella auth revoke export` so revocation bundles are signed with the new key. Downstream caches should refresh JWKS within their configured lifetime (`StellaOpsAuthorityOptions.Signing` + client cache tolerance).
The rotation API leverages the same cryptography abstractions as revocation signing; no restart is required and the previous key is marked `retired` but kept available for verification.
## 6. Bootstrap & Administrative Endpoints
Administrative APIs live under `/internal/*` and require the bootstrap API key plus rate-limiter compliance.
| Endpoint | Method | Description |
| --- | --- | --- |
| `/internal/users` | `POST` | Provision initial administrative accounts through the registered password-capable plug-in. Emits structured audit events. |
| `/internal/clients` | `POST` | Provision OAuth clients (client credentials / device code). |
| `/internal/revocations/export` | `GET` | Export revocation bundle + detached JWS + digest. |
| `/internal/signing/rotate` | `POST` | Promote a new signing key (see SOP above). Request body accepts `keyId`, `location`, optional `source`, `algorithm`, `provider`, and metadata. |
All administrative calls emit `AuthEventRecord` entries enriched with correlation IDs, PII tags, and network metadata for offline SOC ingestion.
> **Tenant hint:** include a `tenant` entry inside `properties` when bootstrapping clients. Authority normalises the value, stores it on the registration, and stamps future tokens/audit events with the tenant.
### Bootstrap client example
```jsonc
POST /internal/clients
{
"clientId": "concelier",
"confidential": true,
"displayName": "Concelier Backend",
"allowedGrantTypes": ["client_credentials"],
"allowedScopes": ["concelier.jobs.trigger", "advisory:ingest", "advisory:read"],
"properties": {
"tenant": "tenant-default"
}
}
```
For environments with multiple tenants, repeat the call per tenant-specific client (e.g. `concelier-tenant-a`, `concelier-tenant-b`) or append suffixes to the client identifier.
### Aggregation-only verification tokens
- Issue a dedicated client (e.g. `aoc-verifier`) with the scopes `aoc:verify`, `advisory:read`, and `vex:read` for each tenant that runs guard checks. Authority refuses to mint tokens for these scopes unless the client registration provides a tenant hint.
- The CLI (`stella aoc verify --tenant <tenant>`) and Console verification panel both call `/aoc/verify` on Concelier and Excititor. Tokens that omit the tenant claim or present a tenant that does not match the stored registration are rejected with `invalid_client`/`invalid_token`.
- Audit: `authority.client_credentials.grant` entries record `scope.invalid="aoc:verify"` when requests are rejected because the tenant hint is missing or mismatched.
### Exception approvals & routing
- New scopes `exceptions:read`, `exceptions:write`, and `exceptions:approve` govern access to the exception lifecycle. Map these via tenant roles (`exceptions-service`, `exceptions-approver`) as described in `/docs/security/authority-scopes.md`.
- Configure approval routing in `authority.yaml` with declarative templates. Each template exposes an `authorityRouteId` for downstream services (Policy Engine, Console) and an optional `requireMfa` flag:
```yaml
exceptions:
routingTemplates:
- id: "secops"
authorityRouteId: "approvals/secops"
requireMfa: true
description: "Security Operations approval chain"
- id: "governance"
authorityRouteId: "approvals/governance"
requireMfa: false
description: "Non-production waiver review"
```
- Clients requesting exception scopes must include a tenant assignment. Authority rejects client-credential flows that request `exceptions:*` with `invalid_client` and logs `scope.invalid="exceptions:write"` (or the requested scope) in `authority.client_credentials.grant` audit events when the tenant hint is missing.
- When any configured routing template sets `requireMfa: true`, user-facing tokens that contain `exceptions:approve` must be acquired through an MFA-capable identity provider. Password/OIDC flows that lack MFA support are rejected with `authority.password.grant` audit events where `reason="Exception approval scope requires an MFA-capable identity provider."`
- Update interactive clients (Console) to request `exceptions:read` by default and elevate to `exceptions:approve` only inside fresh-auth workflows for approvers. Documented examples live in `etc/authority.yaml.sample`.
- Verification responses map guard failures to `ERR_AOC_00x` codes and Authority emits `authority.client_credentials.grant` + `authority.token.validate_access` audit records containing the tenant and scopes so operators can trace who executed a run.
- For air-gapped or offline replicas, pre-issue verification tokens per tenant and rotate them alongside ingest credentials; the guard endpoints never mutate data and remain safe to expose through the offline kit schedule.
## 7. Configuration Reference
| Section | Key | Description | Notes |
| --- | --- | --- | --- |
| Root | `issuer` | Absolute HTTPS issuer advertised to clients. | Required. Loopback HTTP allowed only for development. |
| Tokens | `accessTokenLifetime`, `refreshTokenLifetime`, etc. | Lifetimes for each grant (access, refresh, device, authorization code, identity). | Enforced during issuance; persisted on each token document. |
| Storage | `storage.connectionString` | MongoDB connection string. | Required even for tests; offline kits ship snapshots for seeding. |
| Signing | `signing.enabled` | Enable JWKS/revocation signing. | Disable only for development. |
| Signing | `signing.algorithm` | Signing algorithm identifier. | Currently ES256; additional curves can be wired through crypto providers. |
| Signing | `signing.keySource` | Loader identifier (`file`, `vault`, custom). | Determines which `IAuthoritySigningKeySource` resolves keys. |
| Signing | `signing.keyPath` | Relative/absolute path understood by the loader. | Stored as-is; rotation request should keep it in sync with filesystem layout. |
| Signing | `signing.activeKeyId` | Active JWKS / revocation signing key id. | Exposed as `kid` in JWKS and bundles. |
| Signing | `signing.additionalKeys[].keyId` | Retired key identifier retained for verification. | Manager updates this automatically after rotation; keep YAML aligned. |
| Signing | `signing.additionalKeys[].source` | Loader identifier per retired key. | Defaults to `signing.keySource` if omitted. |
| Security | `security.rateLimiting` | Fixed-window limits for `/token`, `/authorize`, `/internal/*`. | See `docs/security/rate-limits.md` for tuning. |
| Bootstrap | `bootstrap.apiKey` | Shared secret required for `/internal/*`. | Only required when `bootstrap.enabled` is true. |
### 7.1 Sender-constrained clients (DPoP & mTLS)
Authority now understands two flavours of sender-constrained OAuth clients:
- **DPoP proof-of-possession** clients sign a `DPoP` header for `/token` requests. Authority validates the JWK thumbprint, HTTP method/URI, and replay window, then stamps the resulting access token with `cnf.jkt` so downstream services can verify the same key is reused.
- Configure under `security.senderConstraints.dpop`. `allowedAlgorithms`, `proofLifetime`, and `replayWindow` are enforced at validation time.
- `security.senderConstraints.dpop.nonce.enabled` enables nonce challenges for high-value audiences (`requiredAudiences`, normalised to case-insensitive strings). When a nonce is required but missing or expired, `/token` replies with `WWW-Authenticate: DPoP error="use_dpop_nonce"` (and, when available, a fresh `DPoP-Nonce` header). Clients must retry with the issued nonce embedded in the proof.
- `security.senderConstraints.dpop.nonce.store` selects `memory` (default) or `redis`. When `redis` is configured, set `security.senderConstraints.dpop.nonce.redisConnectionString` so replicas share nonce issuance and high-value clients avoid replay gaps during failover.
- Example (enabling Redis-backed nonces; adjust audiences per deployment):
```yaml
security:
senderConstraints:
dpop:
enabled: true
proofLifetime: "00:02:00"
replayWindow: "00:05:00"
allowedAlgorithms: [ "ES256", "ES384" ]
nonce:
enabled: true
ttl: "00:10:00"
maxIssuancePerMinute: 120
store: "redis"
redisConnectionString: "redis://authority-redis:6379?ssl=false"
requiredAudiences:
- "signer"
- "attestor"
```
Operators can override any field via environment variables (e.g. `STELLAOPS_AUTHORITY__SECURITY__SENDERCONSTRAINTS__DPOP__NONCE__STORE=redis`).
- Declare client `audiences` in bootstrap manifests or plug-in provisioning metadata; Authority now defaults the token `aud` claim and `resource` indicator from this list, which is also used to trigger nonce enforcement for audiences such as `signer` and `attestor`.
- **Mutual TLS clients** client registrations may declare an mTLS binding (`senderConstraint: mtls`). When enabled via `security.senderConstraints.mtls`, Authority validates the presented client certificate against stored bindings (`certificateBindings[]`), optional chain verification, and timing windows. Successful requests embed `cnf.x5t#S256` into the access token (and introspection output) so resource servers can enforce the certificate thumbprint.
- `security.senderConstraints.mtls.enforceForAudiences` forces mTLS whenever the requested `aud`/`resource` (or the client's configured audiences) intersect the configured allow-list (default includes `signer`). Clients configured for different sender constraints are rejected early so operator policy remains consistent.
- Certificate bindings now act as an allow-list: Authority verifies thumbprint, subject, issuer, serial number, and any declared SAN values against the presented certificate, with rotation grace windows applied to `notBefore/notAfter`. Operators can enforce subject regexes, SAN type allow-lists (`dns`, `uri`, `ip`), trusted certificate authorities, and rotation grace via `security.senderConstraints.mtls.*`.
Both modes persist additional metadata in `authority_tokens`: `senderConstraint` records the enforced policy, while `senderKeyThumbprint` stores the DPoP JWK thumbprint or mTLS certificate hash captured at issuance. Downstream services can rely on these fields (and the corresponding `cnf` claim) when auditing offline copies of the token store.
### 7.2 Policy Engine clients & scopes
Policy Engine v2 introduces dedicated scopes and a service identity that materialises effective findings. Configure Authority as follows when provisioning policy clients:
| Client | Scopes | Notes |
| --- | --- | --- |
| `policy-engine` (service) | `policy:run`, `findings:read`, `effective:write` | Must include `properties.serviceIdentity: policy-engine` and a tenant. Authority rejects `effective:write` tokens without the marker or tenant. |
| `policy-cli` / automation | `policy:read`, `policy:author`, `policy:review`, `policy:simulate`, `findings:read` *(optionally add `policy:approve` / `policy:operate` / `policy:activate` for promotion pipelines)* | Keep scopes minimal; reroll CLI/CI tokens issued before 20251027 so they drop legacy scope names and adopt the new set. |
| UI/editor sessions | `policy:read`, `policy:author`, `policy:simulate` (+ reviewer/approver/operator scopes as appropriate) | Issue tenant-specific clients so audit and rate limits remain scoped. |
Sample YAML entry:
```yaml
- clientId: "policy-engine"
displayName: "Policy Engine Service"
grantTypes: [ "client_credentials" ]
audiences: [ "api://policy-engine" ]
scopes: [ "policy:run", "findings:read", "effective:write" ]
tenant: "tenant-default"
properties:
serviceIdentity: "policy-engine"
senderConstraint: "dpop"
auth:
type: "client_secret"
secretFile: "../secrets/policy-engine.secret"
```
Compliance checklist:
- [ ] `policy-engine` client includes `properties.serviceIdentity: policy-engine` and a tenant hint; logins missing either are rejected.
- [ ] Non-service clients omit `effective:write` and receive only the scopes required for their role (`policy:read`, `policy:author`, `policy:review`, `policy:approve`, `policy:operate`, `policy:simulate`, etc.).
- [ ] Legacy tokens using `policy:write`/`policy:submit`/`policy:edit` are rotated to the new scope set before Production change freeze (see release migration note below).
- [ ] Approval/activation workflows use identities distinct from authoring identities; tenants are provisioned per client to keep telemetry segregated.
- [ ] Operators document reviewer assignments and incident procedures alongside `/docs/security/policy-governance.md` and archive policy evidence bundles (`stella policy bundle export`) with each release.
### 7.3 Orchestrator roles & scopes
| Role / Client | Scopes | Notes |
| --- | --- | --- |
| `Orch.Viewer` role | `orch:read` | Read-only access to Orchestrator dashboards, queues, and telemetry. |
| `Orch.Operator` role | `orch:read`, `orch:operate` | Issue short-lived tokens for control actions (pause/resume, retry, sync). Token requests **must** include `operator_reason` (≤256 chars) and `operator_ticket` (≤128 chars); Authority rejects requests missing either value and records both in audit events. |
| `Orch.Admin` role | `orch:read`, `orch:operate`, `orch:quota`, `orch:backfill` | Manage tenant quotas, burst ceilings, and historical backfill allowances. Quota tokens **must** include `quota_reason` (≤256 chars) and may include `quota_ticket` (≤128 chars); backfill tokens **must** include both `backfill_reason` (≤256 chars) and `backfill_ticket` (≤128 chars). Authority records all values in audit trails. |
Token request example via client credentials:
```bash
curl -u orch-operator:s3cr3t! \
-d 'grant_type=client_credentials' \
-d 'scope=orch:operate' \
-d 'operator_reason=resume source after maintenance' \
-d 'operator_ticket=INC-2045' \
https://authority.example.com/token
```
Tokens lacking `operator_reason` or `operator_ticket` receive `invalid_request`; audit events (`authority.client_credentials.grant`) surface the supplied values under `request.reason` and `request.ticket` for downstream review.
CLI clients set these parameters via `Authority.OperatorReason` / `Authority.OperatorTicket` (environment variables `STELLAOPS_ORCH_REASON` and `STELLAOPS_ORCH_TICKET`).
Quota administration tokens follow the same pattern:
```bash
curl -u orch-admin:s3cr3t! \
-d 'grant_type=client_credentials' \
-d 'scope=orch:quota' \
-d 'quota_reason=temporary burst for release catch-up' \
-d 'quota_ticket=CHG-8821' \
https://authority.example.com/token
```
CLI automation should supply these values via `Authority.QuotaReason` / `Authority.QuotaTicket` (environment variables `STELLAOPS_ORCH_QUOTA_REASON` and `STELLAOPS_ORCH_QUOTA_TICKET`). Missing `quota_reason` yields `invalid_request`; when provided, both reason and ticket are captured in audit properties (`quota.reason`, `quota.ticket`).
Backfill run tokens extend the same pattern:
```bash
curl -u orch-admin:s3cr3t! \
-d 'grant_type=client_credentials' \
-d 'scope=orch:backfill' \
-d 'backfill_reason=rebuild historical findings for tenant-default' \
-d 'backfill_ticket=INC-9905' \
https://authority.example.com/token
```
CLI clients configure these values via `Authority.BackfillReason` / `Authority.BackfillTicket` (environment variables `STELLAOPS_ORCH_BACKFILL_REASON` and `STELLAOPS_ORCH_BACKFILL_TICKET`). Tokens missing either field are rejected with `invalid_request`; audit events store the supplied values as `backfill.reason` and `backfill.ticket`.
### 7.4 Delegated service accounts
StellaOps Authority issues short-lived delegated tokens for service accounts so automation can operate on behalf of a tenant without sharing the underlying client identity.
**Configuration summary**
```yaml
delegation:
quotas:
maxActiveTokens: 50
serviceAccounts:
- accountId: "svc-observer"
tenant: "tenant-default"
displayName: "Observability Exporter"
description: "Delegated identity used by Export Center to read findings."
enabled: true
allowedScopes: [ "jobs:read", "findings:read" ]
authorizedClients: [ "export-center-worker" ]
tenants:
- name: "tenant-default"
delegation:
maxActiveTokens: 25
```
* `delegation.quotas.maxActiveTokens` caps concurrent delegated tokens per tenant. Authority enforces both a tenant-wide ceiling and a per-account ceiling (the same value by default).
* `serviceAccounts[].allowedScopes` lists scopes that the delegate may request. Requests for scopes outside this set return `invalid_scope`.
* `serviceAccounts[].authorizedClients` restricts which OAuth clients may assume the delegate. Leave empty to allow any tenant client.
* `tenants[].delegation.maxActiveTokens` optionally overrides the quota for a specific tenant.
**Bootstrap administration**
Bootstrap operators can inspect and rotate delegated identities via the internal API group:
```text
GET /internal/service-accounts?tenant={tenantId}
GET /internal/service-accounts/{accountId}/tokens
POST /internal/service-accounts/{accountId}/revocations
```
Requests must include the bootstrap API key header (`X-StellaOps-Bootstrap-Key`). Listing returns the seeded accounts with their configuration; the token listing call shows currently active delegation tokens (status, client, scopes, actor chain) and the revocation endpoint supports bulk or targeted token revocation with audit logging.
Bootstrap seeding reuses the existing Mongo `_id`/`createdAt` values. When Authority restarts with updated configuration it upserts documents without mutating immutable fields, avoiding duplicate or conflicting service-account records.
**Requesting a delegated token**
```bash
curl -u export-center-worker:s3cr3t \
-d 'grant_type=client_credentials' \
-d 'scope=jobs:read findings:read' \
-d 'service_account=svc-observer' \
https://authority.example.com/token
```
Optional `delegation_actor` metadata appends an identity to the actor chain:
```bash
-d 'delegation_actor=pipeline://exporter/step/42'
```
**Token shape & observability**
* Access tokens include `stellaops:service_account` and an `act` claim describing the caller hierarchy (`client_id` ⇒ optional `delegation_actor`).
* `authority_tokens` records `tokenKind = "service_account"`, the `serviceAccountId`, and the normalized `actorChain[]`.
* Audit events (`authority.client_credentials.grant`) emit `delegation.service_account`, `delegation.actor`, and quota outcomes (`delegation.quota.exceeded = true/false`).
* When quota limits are exceeded Authority returns `invalid_request` (`Delegation token quota exceeded for tenant/service account`) and annotates the audit log.
Delegated tokens still honour scope validation, tenant enforcement, sender constraints (DPoP/mTLS), and fresh-auth checks.
## 8. Offline & Sovereign Operation
- **No outbound dependencies:** Authority only contacts MongoDB and local plugins. Discovery and JWKS are cached by clients with offline tolerances (`AllowOfflineCacheFallback`, `OfflineCacheTolerance`). Operators should mirror these responses for air-gapped use.
- **Structured logging:** Every revocation export, signing rotation, bootstrap action, and token issuance emits structured logs with `traceId`, `client_id`, `subjectId`, and `network.remoteIp` where applicable. Mirror logs to your SIEM to retain audit trails without central connectivity.
- **Determinism:** Sorting rules in token and revocation exports guarantee byte-for-byte identical artefacts given the same datastore state. Hashes and signatures remain stable across machines.
## 9. Operational Checklist
- [ ] Protect the bootstrap API key and disable bootstrap endpoints (`bootstrap.enabled: false`) once initial setup is complete.
- [ ] Schedule `stella auth revoke export` (or `/internal/revocations/export`) at the same cadence as Concelier exports so bundles remain in lockstep.
- [ ] Rotate signing keys before expiration; keep at least one retired key until all cached bundles/tokens signed with it have expired.
- [ ] Monitor `/health` and `/ready` plus rate-limiter metrics to detect plugin outages early.
- [ ] Ensure downstream services cache JWKS and revocation bundles within tolerances; stale caches risk accepting revoked tokens.
For plug-in specific requirements, refer to **[Authority Plug-in Developer Guide](dev/31_AUTHORITY_PLUGIN_DEVELOPER_GUIDE.md)**. For revocation bundle validation workflow, see **[Authority Revocation Bundle](security/revocation-bundle.md)**.