Files
git.stella-ops.org/docs/security/rate-limits.md
Vladimir Moushkov ea1106ce7c up
2025-10-15 10:03:56 +03:00

3.7 KiB
Raw Blame History

StellaOps Authority Rate Limit Guidance

StellaOps Authority applies fixed-window rate limiting to critical endpoints so that brute-force and burst traffic are throttled before they can exhaust downstream resources. This guide complements the lockout policy documentation and captures the recommended defaults, override scenarios, and monitoring practices for /token, /authorize, and /internal/* routes.

Configuration Overview

Rate limits live under security.rateLimiting in authority.yaml (and map to the same hierarchy for environment variables). Each endpoint exposes:

  • enabled — toggles the limiter.
  • permitLimit — maximum requests per fixed window.
  • window — window duration expressed as an ISO-8601 timespan (e.g., 00:01:00).
  • queueLimit — number of requests allowed to queue when the window is exhausted.
security:
  rateLimiting:
    token:
      enabled: true
      permitLimit: 30
      window: 00:01:00
      queueLimit: 0
    authorize:
      enabled: true
      permitLimit: 60
      window: 00:01:00
      queueLimit: 10
    internal:
      enabled: false
      permitLimit: 5
      window: 00:01:00
      queueLimit: 0

When limits trigger, middleware decorates responses with Retry-After headers and log tags (authority.endpoint, authority.client_id, authority.remote_ip) so operators can correlate events with clients and source IPs.

Environment overrides follow the same hierarchy. For example:

STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__PERMITLIMIT=60
STELLAOPS_AUTHORITY__SECURITY__RATELIMITING__TOKEN__WINDOW=00:01:00
Scenario permitLimit window queueLimit Notes
Default production 30 60s 0 Balances anonymous quota (33 scans/day) with headroom for tenant bursts.
High-trust clustered IPs 60 60s 5 Requires WAF allowlist + alert aspnetcore_rate_limiting_rejections_total{limiter="authority-token"} <= 1% sustained.
Air-gapped lab 10 120s 0 Lower concurrency reduces noise when running from shared bastion hosts.
Incident lockdown 5 300s 0 Pair with credential lockout limit of 3 attempts and SOC paging for each denial.

Lockout Interplay

  • Rate limiting throttles by IP/client; lockout policies apply per subject. Keep both enabled.
  • During lockdown scenarios, reduce security.lockout.maxFailures alongside the rate limits above so that subjects face quicker escalation.
  • Map support playbooks to the observed Retry-After value: anything above 120 seconds should trigger manual investigation before re-enabling clients.

Monitoring and Alerts

  1. Metrics
    • aspnetcore_rate_limiting_rejections_total{limiter="authority-token"} for /token.
    • aspnetcore_rate_limiting_rejections_total{limiter="authority-authorize"} for /authorize.
    • Custom counters derived from the structured log tags (authority.remote_ip, authority.client_id).
  2. Dashboards
    • Requests vs. rejections per endpoint.
    • Top offending clients/IP ranges in the current window.
    • Heatmap of retry-after durations to spot persistent throttling.
  3. Alerts
    • Notify SOC when 429 rates exceed 25% for five consecutive minutes on any limiter.
    • Trigger client-specific alerts when a single client_id produces >100 throttle events/hour.

Operational Checklist

  • Validate updated limits in staging before production rollout; smoke-test with representative workload.
  • When raising limits, confirm audit events continue to capture authority.client_id, authority.remote_ip, and correlation IDs for throttle responses.
  • Document any overrides in the change log, including justification and expiry review date.