git.stella-ops.org/docs/contracts/rate-limit-design.md

# Rate Limit Design Contract

**Contract ID:** CONTRACT-RATE-LIMIT-001
**Status:** APPROVED
**Effective Date:** 2025-12-07
**Owners:** Platform Reliability Guild, Gateway Guild

## Overview

This contract defines the rate limiting design for StellaOps API endpoints, ensuring fair resource allocation, protection against abuse, and consistent client experience across all services.

## Rate Limiting Strategy

### Tiered Rate Limits

| Tier | Requests/Minute | Requests/Hour | Burst Limit | Typical Use Case |
|------|-----------------|---------------|-------------|------------------|
| **Free** | 60 | 1,000 | 10 | Evaluation, small projects |
| **Standard** | 300 | 10,000 | 50 | Production workloads |
| **Enterprise** | 1,000 | 50,000 | 200 | Large-scale deployments |
| **Unlimited** | No limit | No limit | No limit | Internal services, VIP |

### Per-Endpoint Rate Limits

Some endpoints have additional rate limits based on resource intensity:

| Endpoint Category | Rate Limit | Rationale |
|-------------------|------------|-----------|
| `/api/risk/simulation/*` | 30/min | CPU-intensive simulation |
| `/api/risk/simulation/studio/*` | 10/min | Full breakdown analysis |
| `/system/airgap/seal` | 5/hour | Critical state change |
| `/policy/decisions` | 100/min | Lightweight evaluation |
| `/api/policy/packs/*/bundle` | 10/min | Bundle compilation |
| Export endpoints | 20/min | I/O-intensive operations |

## Implementation

### Algorithm

Use **Token Bucket** algorithm with the following configuration:

```yaml
rate_limit:
  algorithm: token_bucket
  bucket_size: ${BURST_LIMIT}
  refill_rate: ${REQUESTS_PER_MINUTE} / 60
  refill_interval: 1s
```

### Rate Limit Headers

All responses include standard rate limit headers:

```http
X-RateLimit-Limit: 300
X-RateLimit-Remaining: 295
X-RateLimit-Reset: 1701936000
X-RateLimit-Policy: standard
Retry-After: 30
```

### Rate Limit Response

When rate limit is exceeded, return:

```http
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 30

```

## Rate Limit Keys

### Primary Key: Tenant ID + Client ID

```
rate_limit_key = "${tenant_id}:${client_id}"
```

### Fallback Keys

1. Authenticated: `tenant:${tenant_id}:user:${user_id}`
2. API Key: `apikey:${api_key_hash}`
3. Anonymous: `ip:${client_ip}`

## Exemptions

### Exempt Endpoints

The following endpoints are exempt from rate limiting:

- `GET /health`
- `GET /ready`
- `GET /metrics`
- `GET /.well-known/*`

### Exempt Clients

- Internal service mesh traffic (mTLS authenticated)
- Localhost connections in development mode
- Clients with `unlimited` tier

## Quota Management

### Tenant Quota Tracking

```yaml
quota:
  tracking:
    storage: redis
    key_prefix: "stellaops:quota:"
    ttl: 3600  # 1 hour rolling window

  dimensions:
    - tenant_id
    - endpoint_category
    - time_bucket
```

### Quota Alerts

| Threshold | Action |
|-----------|--------|
| 80% consumed | Emit `quota.warning` event |
| 95% consumed | Emit `quota.critical` event |
| 100% consumed | Block requests, emit `quota.exceeded` event |

## Configuration

### Gateway Configuration

```yaml
# gateway/rate-limits.yaml
rateLimiting:
  enabled: true
  defaultTier: standard

  tiers:
    free:
      requestsPerMinute: 60
      requestsPerHour: 1000
      burstLimit: 10
    standard:
      requestsPerMinute: 300
      requestsPerHour: 10000
      burstLimit: 50
    enterprise:
      requestsPerMinute: 1000
      requestsPerHour: 50000
      burstLimit: 200

  endpoints:
    - pattern: "/api/risk/simulation/*"
      limit: 30
      window: 60s
    - pattern: "/api/risk/simulation/studio/*"
      limit: 10
      window: 60s
    - pattern: "/system/airgap/seal"
      limit: 5
      window: 3600s
```

### Policy Engine Configuration

```csharp
// PolicyEngineRateLimitOptions.cs
public static class PolicyEngineRateLimitOptions
{
    public const string PolicyName = "PolicyEngineRateLimit";

    public static void Configure(RateLimiterOptions options)
    {
        options.AddTokenBucketLimiter(PolicyName, opt =>
        {
            opt.TokenLimit = 50;
            opt.QueueLimit = 10;
            opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
            opt.TokensPerPeriod = 5;
            opt.AutoReplenishment = true;
        });
    }
}
```

## Monitoring

### Metrics

| Metric | Type | Labels |
|--------|------|--------|
| `stellaops_rate_limit_requests_total` | Counter | tier, endpoint, status |
| `stellaops_rate_limit_exceeded_total` | Counter | tier, endpoint |
| `stellaops_rate_limit_remaining` | Gauge | tenant_id, tier |
| `stellaops_rate_limit_queue_size` | Gauge | endpoint |

### Alerts

```yaml
# prometheus/rules/rate-limiting.yaml
groups:
  - name: rate_limiting
    rules:
      - alert: HighRateLimitExceeded
        expr: rate(stellaops_rate_limit_exceeded_total[5m]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High rate of rate limit exceeded events"
```

## Integration with Web UI

### Client SDK Configuration

```typescript
// stellaops-sdk/rate-limit-handler.ts
interface RateLimitConfig {
  retryOnRateLimit: boolean;
  maxRetries: number;
  backoffMultiplier: number;
  maxBackoffSeconds: number;
}

const defaultConfig: RateLimitConfig = {
  retryOnRateLimit: true,
  maxRetries: 3,
  backoffMultiplier: 2,
  maxBackoffSeconds: 60
};
```

### UI Rate Limit Display

The Web UI displays rate limit status in the console header with:
- Current remaining requests
- Time until reset
- Visual indicator when approaching limit (< 20% remaining)

## Changelog

| Date | Version | Change |
|------|---------|--------|
| 2025-12-07 | 1.0.0 | Initial contract definition |

## References

- [API Governance Baseline](./api-governance-baseline.md)
- [Web Gateway Architecture](../modules/gateway/architecture.md)
- [Policy Engine Rate Limiting](../modules/policy/design/rate-limiting.md)