up
This commit is contained in:
		
							
								
								
									
										76
									
								
								docs/security/rate-limits.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										76
									
								
								docs/security/rate-limits.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,76 @@ | ||||
| # StellaOps Authority Rate Limit Guidance | ||||
|  | ||||
| StellaOps Authority applies fixed-window rate limiting to critical endpoints so that brute-force and burst traffic are throttled before they can exhaust downstream resources. This guide complements the lockout policy documentation and captures the recommended defaults, override scenarios, and monitoring practices for `/token`, `/authorize`, and `/internal/*` routes. | ||||
|  | ||||
| ## Configuration Overview | ||||
|  | ||||
| Rate limits live under `security.rateLimiting` in `authority.yaml` (and map to the same hierarchy for environment variables). Each endpoint exposes: | ||||
|  | ||||
| - `enabled` — toggles the limiter. | ||||
| - `permitLimit` — maximum requests per fixed window. | ||||
| - `window` — window duration expressed as an ISO-8601 timespan (e.g., `00:01:00`). | ||||
| - `queueLimit` — number of requests allowed to queue when the window is exhausted. | ||||
|  | ||||
| ```yaml | ||||
| security: | ||||
|   rateLimiting: | ||||
|     token: | ||||
|       enabled: true | ||||
|       permitLimit: 30 | ||||
|       window: 00:01:00 | ||||
|       queueLimit: 0 | ||||
|     authorize: | ||||
|       enabled: true | ||||
|       permitLimit: 60 | ||||
|       window: 00:01:00 | ||||
|       queueLimit: 10 | ||||
|     internal: | ||||
|       enabled: false | ||||
|       permitLimit: 5 | ||||
|       window: 00:01:00 | ||||
|       queueLimit: 0 | ||||
| ``` | ||||
|  | ||||
| When limits trigger, middleware decorates responses with `Retry-After` headers and log tags (`authority.endpoint`, `authority.client_id`, `authority.remote_ip`) so operators can correlate events with clients and source IPs. | ||||
|  | ||||
| Environment overrides follow the same hierarchy. For example: | ||||
|  | ||||
| ``` | ||||
| STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__PERMITLIMIT=60 | ||||
| STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__WINDOW=00:01:00 | ||||
| ``` | ||||
|  | ||||
| ## Recommended Profiles | ||||
|  | ||||
| | Scenario | permitLimit | window | queueLimit | Notes | | ||||
| |----------|-------------|--------|------------|-------| | ||||
| | Default production | 30 | 60s | 0 | Balances anonymous quota (33 scans/day) with headroom for tenant bursts. | | ||||
| | High-trust clustered IPs | 60 | 60s | 5 | Requires WAF allowlist + alert `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"} <= 1%` sustained. | | ||||
| | Air-gapped lab | 10 | 120s | 0 | Lower concurrency reduces noise when running from shared bastion hosts. | | ||||
| | Incident lockdown | 5 | 300s | 0 | Pair with credential lockout limit of 3 attempts and SOC paging for each denial. | | ||||
|  | ||||
| ### Lockout Interplay | ||||
|  | ||||
| - Rate limiting throttles by IP/client; lockout policies apply per subject. Keep both enabled. | ||||
| - During lockdown scenarios, reduce `security.lockout.maxFailures` alongside the rate limits above so that subjects face quicker escalation. | ||||
| - Map support playbooks to the observed `Retry-After` value: anything above 120 seconds should trigger manual investigation before re-enabling clients. | ||||
|  | ||||
| ## Monitoring and Alerts | ||||
|  | ||||
| 1. **Metrics** | ||||
|    - `aspnetcore_rate_limiting_rejections_total{limiter="authority-token"}` for `/token`. | ||||
|    - `aspnetcore_rate_limiting_rejections_total{limiter="authority-authorize"}` for `/authorize`. | ||||
|    - Custom counters derived from the structured log tags (`authority.remote_ip`, `authority.client_id`). | ||||
| 2. **Dashboards** | ||||
|    - Requests vs. rejections per endpoint. | ||||
|    - Top offending clients/IP ranges in the current window. | ||||
|    - Heatmap of retry-after durations to spot persistent throttling. | ||||
| 3. **Alerts** | ||||
|    - Notify SOC when 429 rates exceed 25 % for five consecutive minutes on any limiter. | ||||
|    - Trigger client-specific alerts when a single client_id produces >100 throttle events/hour. | ||||
|  | ||||
| ## Operational Checklist | ||||
|  | ||||
| - Validate updated limits in staging before production rollout; smoke-test with representative workload. | ||||
| - When raising limits, confirm audit events continue to capture `authority.client_id`, `authority.remote_ip`, and correlation IDs for throttle responses. | ||||
| - Document any overrides in the change log, including justification and expiry review date. | ||||
		Reference in New Issue
	
	Block a user