Files
git.stella-ops.org/docs/router/rate-limiting.md
2025-12-18 00:47:24 +02:00

123 lines
3.3 KiB
Markdown

# Router Rate Limiting
Router rate limiting is a **gateway-owned** control plane feature implemented in `StellaOps.Router.Gateway`. It enforces limits centrally so microservices do not implement ad-hoc HTTP throttling.
## Behavior
When a request is denied the Router returns:
- `429 Too Many Requests`
- `Retry-After: <seconds>`
- `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset` (Unix seconds)
- JSON body:
```json
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Try again in 12 seconds.",
"retryAfter": 12,
"limit": 100,
"current": 101,
"window": 60,
"scope": "environment"
}
```
## Model
Two scopes exist:
- **Instance (`for_instance`)**: in-memory sliding window; protects a single Router process.
- **Environment (`for_environment`)**: Valkey-backed fixed window; protects the whole environment across Router instances.
Environment checks are gated by an **activation threshold** (`process_back_pressure_when_more_than_per_5min`) to avoid unnecessary Valkey calls at low traffic.
## Configuration
Configuration is under the `rate_limiting` root.
### Minimal (instance only)
```yaml
rate_limiting:
process_back_pressure_when_more_than_per_5min: 5000
for_instance:
rules:
- per_seconds: 60
max_requests: 600
```
### Environment (Valkey)
```yaml
rate_limiting:
process_back_pressure_when_more_than_per_5min: 0 # always check environment
for_environment:
valkey_connection: "valkey.stellaops.local:6379"
valkey_bucket: "stella-router-rate-limit"
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
half_open_timeout: 10
rules:
- per_seconds: 60
max_requests: 600
```
### Rule stacking (AND logic)
Multiple rules on the same target are evaluated with **AND** semantics:
```yaml
rate_limiting:
for_environment:
rules:
- per_seconds: 1
max_requests: 10
- per_seconds: 3600
max_requests: 3000
```
If any rule is exceeded the request is denied. The Router returns the **most restrictive** `Retry-After` among violated rules.
### Microservice overrides
Overrides are **replacement**, not merge:
```yaml
rate_limiting:
for_environment:
rules:
- per_seconds: 60
max_requests: 600
microservices:
scanner:
rules:
- per_seconds: 10
max_requests: 50
```
### Route overrides
Route-level configuration is under:
`rate_limiting.for_environment.microservices.<microservice>.routes.<route_name>`
See `docs/router/rate-limiting-routes.md` for match types and specificity rules.
## Notes
- If `rules` is present, it takes precedence over legacy single-window keys (`per_seconds`, `max_requests`, `allow_*`).
- For allowed requests, headers represent the **smallest window** rule for deterministic, low-cardinality output (not a full multi-rule snapshot).
- If Valkey is unavailable, environment limiting is **fail-open** (instance limits still apply).
## Testing
- Unit tests: `dotnet test StellaOps.Router.slnx -c Release`
- Valkey integration tests (Docker required): `STELLAOPS_INTEGRATION_TESTS=true dotnet test StellaOps.Router.slnx -c Release --filter FullyQualifiedName~ValkeyRateLimitStoreIntegrationTests`
- k6 load tests: `tests/load/router-rate-limiting-load-test.js` (see `tests/load/README.md`)