Add authority bootstrap flows and Concelier ops runbooks
This commit is contained in:
		| @@ -15,7 +15,7 @@ Audit events share the `StellaOps.Cryptography.Audit.AuthEventRecord` contract. | ||||
| - `Client` — `AuthEventClient` with client identifier, display name, and originating provider/plugin. | ||||
| - `Scopes` — granted or requested OAuth scopes (sorted before emission). | ||||
| - `Network` — `AuthEventNetwork` with remote address, forwarded headers, and user agent string (all treated as PII). | ||||
| - `Properties` — additional `AuthEventProperty` entries for context-specific details (lockout durations, policy decisions, retries, etc.). | ||||
| - `Properties` — additional `AuthEventProperty` entries for context-specific details (lockout durations, policy decisions, retries, `request.tampered`/`request.unexpected_parameter`, `bootstrap.invite_token`, etc.). | ||||
|  | ||||
| ## Data Classifications | ||||
|  | ||||
| @@ -33,7 +33,13 @@ Event names follow dotted notation: | ||||
|  | ||||
| - `authority.password.grant` — password grant handled by OpenIddict. | ||||
| - `authority.client_credentials.grant` — client credential grant handling. | ||||
| - `authority.token.tamper` — suspicious `/token` request detected (unexpected parameters or manipulated payload). | ||||
| - `authority.bootstrap.user` and `authority.bootstrap.client` — bootstrap API operations. | ||||
| - `authority.bootstrap.invite.created` — operator created a bootstrap invite. | ||||
| - `authority.bootstrap.invite.consumed` — invite consumed during user/client provisioning. | ||||
| - `authority.bootstrap.invite.expired` — invite expired without being used. | ||||
| - `authority.bootstrap.invite.rejected` — invite was rejected (invalid, mismatched provider/target, or already consumed). | ||||
| - `authority.token.replay.suspected` — replay heuristics detected a token being used from a new device fingerprint. | ||||
| - Future additions should preserve the `authority.<surface>.<action>` pattern to keep filtering deterministic. | ||||
|  | ||||
| ## Persistence | ||||
|   | ||||
| @@ -82,9 +82,9 @@ flowchart LR | ||||
| | Threat | STRIDE Vector | Surface | Risk (L×I) | Existing Controls | Gaps / Actions | Owner | | ||||
| |--------|---------------|---------|------------|-------------------|----------------|-------| | ||||
| | Spoofed revocation bundle | Spoofing | TB5 — Authority ↔️ Agents | Med×High | Detached JWS signature (planned), offline kit checksums | Finalise signing key registry & verification script (SEC4.B/SEC4.HOST); add bundle freshness requirement | Security Guild (follow-up: **SEC5.B**) | | ||||
| | Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Add audit coverage for tampered inputs, align correlation IDs with SOC (SEC2.A/SEC2.B) | Security Guild + Authority Core (follow-up: **SEC5.C**) | | ||||
| | Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Enforce invite expiration + audit trail for unused invites | Security Guild (follow-up: **SEC5.D**) | | ||||
| | Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Planned revocation bundles, optional mTLS | Require agent binding (device fingerprint) and enforce revocation grace window alerts | Security Guild + Zastava (follow-up: **SEC5.E**) | | ||||
| | Parameter tampering on `/token` | Tampering | TB1 — Public ingress | Med×High | ASP.NET model validation, OpenIddict, rate limiter (CORE8.RL) | Tampered requests emit `authority.token.tamper` audit events (`request.tampered`, unexpected parameter names) correlating with `/token` outcomes (SEC5.C) | Security Guild + Authority Core (follow-up: **SEC5.C**) | | ||||
| | Bootstrap invite replay | Repudiation | TB4 — Operator CLI ↔️ Authority | Low×High | One-time bootstrap tokens, Argon2id hashing on creation | Invites expire automatically and emit audit events on consumption/expiration (SEC5.D) | Security Guild | | ||||
| | Token replay by stolen agent | Information Disclosure | TB5 | Med×High | Signed revocation bundles, device fingerprint heuristics, optional mTLS | Monitor revocation acknowledgement latency via Zastava and tune replay alerting thresholds | Security Guild + Zastava (follow-up: **SEC5.E**) | | ||||
| | Privilege escalation via plug-in override | Elevation of Privilege | TB3 — Plug-in sandbox | Med×High | Signed plug-ins, restart-only loading, configuration validation | Add static analysis on manifest overrides + runtime warning when policy weaker than host | Security Guild + DevOps (follow-up: **SEC5.F**) | | ||||
| | Offline bundle tampering | Tampering | Distribution | Low×High | SHA256 manifest, signed bundles (planned) | Add supply-chain attestation for Offline Kit, publish verification CLI in docs | Security Guild + Ops (follow-up: **SEC5.G**) | | ||||
| | Failure to log denied tokens | Repudiation | TB2 — Authority ↔️ Mongo | Med×Med | Serilog structured events (partial), Mongo persistence path (planned) | Finalise audit schema (SEC2.A) and ensure `/token` denies include subject/client/IP fields | Security Guild + Authority Core (follow-up: **SEC5.H**) | | ||||
| @@ -98,7 +98,7 @@ Risk scoring uses qualitative scale (Low/Med/High) for likelihood × impact; mit | ||||
| | SEC5.B | Spoofed revocation bundle | Complete libsodium/Core signing integration and ship revocation verification script. | Security Guild + Authority Core | | ||||
| | SEC5.C | Parameter tampering on `/token` | Finalise audit contract (`SEC2.A`) and add request tamper logging. | Security Guild + Authority Core | | ||||
| | SEC5.D | Bootstrap invite replay | Implement expiry enforcement + audit coverage for unused bootstrap invites. | Security Guild | | ||||
| | SEC5.E | Token replay by stolen agent | Document device binding requirements and create detector for stale revocation acknowledgements. | Security Guild + Zastava | | ||||
| | SEC5.E | Token replay by stolen agent | Coordinate Zastava alerting with the new device fingerprint heuristics and surface stale revocation acknowledgements. | Security Guild + Zastava | | ||||
| | SEC5.F | Plug-in override escalation | Static analysis of plug-in manifests; warn on weaker password policy overrides. | Security Guild + DevOps | | ||||
| | SEC5.G | Offline bundle tampering | Extend Offline Kit build to include attested manifest + verification CLI sample. | Security Guild + Ops | | ||||
| | SEC5.H | Failure to log denied tokens | Ensure audit persistence for all `/token` denials with correlation IDs. | Security Guild + Authority Core | | ||||
|   | ||||
							
								
								
									
										76
									
								
								docs/security/rate-limits.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										76
									
								
								docs/security/rate-limits.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,76 @@ | ||||
| # StellaOps Authority Rate Limit Guidance | ||||
|  | ||||
| StellaOps Authority applies fixed-window rate limiting to critical endpoints so that brute-force and burst traffic are throttled before they can exhaust downstream resources. This guide complements the lockout policy documentation and captures the recommended defaults, override scenarios, and monitoring practices for `/token`, `/authorize`, and `/internal/*` routes. | ||||
|  | ||||
| ## Configuration Overview | ||||
|  | ||||
| Rate limits live under `security.rateLimiting` in `authority.yaml` (and map to the same hierarchy for environment variables). Each endpoint exposes: | ||||
|  | ||||
| - `enabled` — toggles the limiter. | ||||
| - `permitLimit` — maximum requests per fixed window. | ||||
| - `window` — window duration expressed as an ISO-8601 timespan (e.g., `00:01:00`). | ||||
| - `queueLimit` — number of requests allowed to queue when the window is exhausted. | ||||
|  | ||||
| ```yaml | ||||
| security: | ||||
|   rateLimiting: | ||||
|     token: | ||||
|       enabled: true | ||||
|       permitLimit: 30 | ||||
|       window: 00:01:00 | ||||
|       queueLimit: 0 | ||||
|     authorize: | ||||
|       enabled: true | ||||
|       permitLimit: 60 | ||||
|       window: 00:01:00 | ||||
|       queueLimit: 10 | ||||
|     internal: | ||||
|       enabled: false | ||||
|       permitLimit: 5 | ||||
|       window: 00:01:00 | ||||
|       queueLimit: 0 | ||||
| ``` | ||||
|  | ||||
| When limits trigger, middleware decorates responses with `Retry-After` headers and log tags (`authority.endpoint`, `authority.client_id`, `authority.remote_ip`) so operators can correlate events with clients and source IPs. | ||||
|  | ||||
| Environment overrides follow the same hierarchy. For example: | ||||
|  | ||||
| ``` | ||||
| STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__PERMITLIMIT=60 | ||||
| STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__WINDOW=00:01:00 | ||||
| ``` | ||||
|  | ||||
| ## Recommended Profiles | ||||
|  | ||||
| | Scenario | permitLimit | window | queueLimit | Notes | | ||||
| |----------|-------------|--------|------------|-------| | ||||
| | Default production | 30 | 60s | 0 | Balances anonymous quota (33 scans/day) with headroom for tenant bursts. | | ||||
| | High-trust clustered IPs | 60 | 60s | 5 | Requires WAF allowlist + alert `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"} <= 1%` sustained. | | ||||
| | Air-gapped lab | 10 | 120s | 0 | Lower concurrency reduces noise when running from shared bastion hosts. | | ||||
| | Incident lockdown | 5 | 300s | 0 | Pair with credential lockout limit of 3 attempts and SOC paging for each denial. | | ||||
|  | ||||
| ### Lockout Interplay | ||||
|  | ||||
| - Rate limiting throttles by IP/client; lockout policies apply per subject. Keep both enabled. | ||||
| - During lockdown scenarios, reduce `security.lockout.maxFailures` alongside the rate limits above so that subjects face quicker escalation. | ||||
| - Map support playbooks to the observed `Retry-After` value: anything above 120 seconds should trigger manual investigation before re-enabling clients. | ||||
|  | ||||
| ## Monitoring and Alerts | ||||
|  | ||||
| 1. **Metrics** | ||||
|    - `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"}` for `/token`. | ||||
|    - `aspnetcore_rate_limiting_rejections_total{limiter="authority-authorize"}` for `/authorize`. | ||||
|    - Custom counters derived from the structured log tags (`authority.remote_ip`, `authority.client_id`). | ||||
| 2. **Dashboards** | ||||
|    - Requests vs. rejections per endpoint. | ||||
|    - Top offending clients/IP ranges in the current window. | ||||
|    - Heatmap of retry-after durations to spot persistent throttling. | ||||
| 3. **Alerts** | ||||
|    - Notify SOC when 429 rates exceed 25 % for five consecutive minutes on any limiter. | ||||
|    - Trigger client-specific alerts when a single client_id produces >100 throttle events/hour. | ||||
|  | ||||
| ## Operational Checklist | ||||
|  | ||||
| - Validate updated limits in staging before production rollout; smoke-test with representative workload. | ||||
| - When raising limits, confirm audit events continue to capture `authority.client_id`, `authority.remote_ip`, and correlation IDs for throttle responses. | ||||
| - Document any overrides in the change log, including justification and expiry review date. | ||||
| @@ -43,6 +43,7 @@ Consumers MUST treat the combination of `schemaVersion` and `sequence` as a mono | ||||
|    { | ||||
|      "alg": "ES256", | ||||
|      "kid": "{signingKeyId}", | ||||
|      "provider": "{providerName}", | ||||
|      "typ": "application/vnd.stellaops.revocation-bundle+jws", | ||||
|      "b64": false, | ||||
|      "crit": ["b64"] | ||||
| @@ -54,8 +55,28 @@ Verification steps: | ||||
|  | ||||
| 1. Validate `revocation-bundle.json` against the schema. | ||||
| 2. Re-compute SHA-256 and compare with `.sha256` (if present). | ||||
| 3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle. | ||||
| 4. Verify the detached JWS using the stored signing key (example tooling coming with `stella auth revoke verify`). | ||||
| 3. Resolve the signing key from JWKS (`/.well-known/jwks.json`) or the offline key bundle, preferring the provider declared in the JWS header (`provider` falls back to `default`). | ||||
| 4. Verify the detached JWS using the resolved provider. The CLI mirrors Authority resolution, so builds compiled with `StellaOpsCryptoSodium=true` automatically use the libsodium provider when advertised; otherwise verification downgrades to the managed fallback. | ||||
|  | ||||
| ### CLI verification workflow | ||||
|  | ||||
| Use the bundled CLI command before distributing a bundle: | ||||
|  | ||||
| ```bash | ||||
| stellaops auth revoke verify \ | ||||
|   --bundle artifacts/revocation-bundle.json \ | ||||
|   --signature artifacts/revocation-bundle.json.jws \ | ||||
|   --key etc/authority/signing/authority-public.pem \ | ||||
|   --verbose | ||||
| ``` | ||||
|  | ||||
| The verifier performs three checks: | ||||
|  | ||||
| 1. Prints the computed digest in `sha256:<hex>` format. Compare it with the exported `.sha256` artefact. | ||||
| 2. Confirms the detached JWS header advertises `b64: false`, captures the provider hint, and that the algorithm matches the Authority configuration (ES256 unless overridden). | ||||
| 3. Registers the supplied PEM key with the crypto provider registry and validates the signature (falling back to the managed provider when the hinted provider is unavailable). | ||||
|  | ||||
| A zero exit code means the bundle is ready for mirroring/import. Non-zero codes signal missing arguments, malformed JWS payloads, or signature mismatches; regenerate or re-sign the bundle before distribution. | ||||
|  | ||||
| ## Example | ||||
|  | ||||
| @@ -64,7 +85,7 @@ The repository contains an [example bundle](revocation-bundle-example.json) demo | ||||
| ## Operations Quick Reference | ||||
|  | ||||
| - `stella auth revoke export` emits a canonical JSON bundle, `.sha256` digest, and detached JWS signature in one command. Use `--output` to write into your mirror staging directory. | ||||
| - `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key and reports digest mismatches before distribution. | ||||
| - `stella auth revoke verify` validates a bundle using cached JWKS or an offline PEM key, honours the `provider` metadata embedded in the signature, and reports digest mismatches before distribution. | ||||
| - `POST /internal/revocations/export` provides the same payload for orchestrators that already talk to the bootstrap API. | ||||
| - `POST /internal/signing/rotate` rotates JWKS material without downtime; always export a fresh bundle afterward so downstream mirrors receive signatures from the new `kid`. | ||||
| - Offline Kit automation should mirror `revocation-bundle.json*` alongside Feedser exports so agents ingest revocations during the same sync pass. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user