partly or unimplemented features - now implemented
This commit is contained in:
@@ -44,8 +44,10 @@ src/Gateway/
|
||||
│ ├── Middleware/
|
||||
│ │ ├── TenantMiddleware.cs # Tenant context extraction
|
||||
│ │ ├── RequestRoutingMiddleware.cs # HTTP → binary routing
|
||||
│ │ ├── AuthenticationMiddleware.cs # DPoP/mTLS validation
|
||||
│ │ └── RateLimitingMiddleware.cs # Per-tenant throttling
|
||||
│ │ ├── SenderConstraintMiddleware.cs # DPoP/mTLS validation
|
||||
│ │ ├── IdentityHeaderPolicyMiddleware.cs # Identity header sanitization
|
||||
│ │ ├── CorrelationIdMiddleware.cs # Request correlation
|
||||
│ │ └── HealthCheckMiddleware.cs # Health probe handling
|
||||
│ ├── Services/
|
||||
│ │ ├── GatewayHostedService.cs # Transport lifecycle
|
||||
│ │ ├── OpenApiAggregationService.cs # Spec aggregation
|
||||
@@ -329,9 +331,37 @@ gateway:
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
- Per-tenant: Configurable requests/minute
|
||||
- Per-identity: Burst protection
|
||||
- Global: Circuit breaker for overload
|
||||
Gateway uses the Router's dual-window rate limiting middleware with circuit breaker:
|
||||
|
||||
- **Instance-level** (in-memory): Per-router-instance limits using sliding window counters
|
||||
- High-precision sub-second buckets for fair rate distribution
|
||||
- No external dependencies; always available
|
||||
- **Environment-level** (Valkey-backed): Cross-instance limits for distributed deployments
|
||||
- Atomic Lua scripts for consistent counting across instances
|
||||
- Circuit breaker pattern for fail-open behavior when Valkey is unavailable
|
||||
- **Activation gate**: Environment-level checks only activate above traffic threshold (configurable)
|
||||
- **Response headers**: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After
|
||||
|
||||
Configuration via `appsettings.yaml`:
|
||||
```yaml
|
||||
rate_limiting:
|
||||
process_back_pressure_when_more_than_per_5min: 5000
|
||||
for_instance:
|
||||
rules:
|
||||
- max_requests: 100
|
||||
per_seconds: 1
|
||||
- max_requests: 1000
|
||||
per_seconds: 60
|
||||
for_environment:
|
||||
valkey_connection: "localhost:6379"
|
||||
rules:
|
||||
- max_requests: 10000
|
||||
per_seconds: 60
|
||||
circuit_breaker:
|
||||
failure_threshold: 3
|
||||
timeout_seconds: 30
|
||||
half_open_timeout: 10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -443,12 +473,80 @@ spec:
|
||||
| Feature | Sprint | Status |
|
||||
|---------|--------|--------|
|
||||
| Core implementation | 3600.0001.0001 | TODO |
|
||||
| Performance Testing Pipeline | 038 | DONE |
|
||||
| WebSocket support | Future | Planned |
|
||||
| gRPC passthrough | Future | Planned |
|
||||
| GraphQL aggregation | Future | Exploration |
|
||||
|
||||
---
|
||||
|
||||
## 14) Performance Testing Pipeline (k6 + Prometheus + Correlation IDs)
|
||||
|
||||
### Overview
|
||||
|
||||
The Gateway includes a comprehensive performance testing pipeline with k6 load tests,
|
||||
Prometheus metric instrumentation, and Grafana dashboards for performance curve modelling.
|
||||
|
||||
### k6 Scenarios (A–G)
|
||||
|
||||
| Scenario | Purpose | VUs | Duration | Key Metric |
|
||||
|----------|---------|-----|----------|------------|
|
||||
| A — Health Baseline | Sub-ms health probe overhead | 10 | 1 min | P95 < 10 ms |
|
||||
| B — OpenAPI Aggregation | Spec cache under concurrent readers | 50 | 75 s | P95 < 200 ms |
|
||||
| C — Routing Throughput | Mixed-method routing at target RPS | 200 | 2 min | P50 < 2 ms, P99 < 5 ms |
|
||||
| D — Correlation ID | Propagation overhead measurement | 20 | 1 min | P95 < 5 ms overhead |
|
||||
| E — Rate Limit Boundary | Enforcement correctness at boundary | 100 | 1 min | Retry-After header |
|
||||
| F — Connection Ramp | Transport saturation (ramp to 1000 VUs) | 1000 | 2 min | No 503 responses |
|
||||
| G — Steady-State Soak | Memory leak / resource exhaustion | 50 | 10 min | Stable memory |
|
||||
|
||||
Run all scenarios:
|
||||
```bash
|
||||
k6 run --env BASE_URL=https://gateway.stella-ops.local src/Gateway/__Tests/load/gateway_performance.k6.js
|
||||
```
|
||||
|
||||
Run a single scenario:
|
||||
```bash
|
||||
k6 run --env BASE_URL=https://gateway.stella-ops.local --env SCENARIO=scenario_c_routing_throughput src/Gateway/__Tests/load/gateway_performance.k6.js
|
||||
```
|
||||
|
||||
### Performance Metrics (GatewayPerformanceMetrics)
|
||||
|
||||
Meter: `StellaOps.Gateway.Performance`
|
||||
|
||||
| Instrument | Type | Unit | Description |
|
||||
|------------|------|------|-------------|
|
||||
| `gateway.requests.total` | Counter | — | Total requests processed |
|
||||
| `gateway.errors.total` | Counter | — | Errors (4xx/5xx) |
|
||||
| `gateway.ratelimit.total` | Counter | — | Rate-limited requests (429) |
|
||||
| `gateway.request.duration` | Histogram | ms | Full request duration |
|
||||
| `gateway.auth.duration` | Histogram | ms | Auth middleware duration |
|
||||
| `gateway.transport.duration` | Histogram | ms | TCP/TLS transport duration |
|
||||
| `gateway.routing.duration` | Histogram | ms | Instance selection duration |
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
Dashboard: `devops/telemetry/dashboards/stella-ops-gateway-performance.json`
|
||||
UID: `stella-ops-gateway-performance`
|
||||
|
||||
Panels:
|
||||
1. **Overview row** — P50/P99 gauges, error rate, RPS
|
||||
2. **Latency Distribution** — Percentile time series (overall + per-service)
|
||||
3. **Throughput & Rate Limiting** — RPS by service, rate-limited requests by route
|
||||
4. **Pipeline Breakdown** — Auth/Routing/Transport P95 breakdown, errors by status
|
||||
5. **Connections & Resources** — Active connections, endpoints, memory usage
|
||||
|
||||
### C# Models
|
||||
|
||||
| Type | Purpose |
|
||||
|------|---------|
|
||||
| `GatewayPerformanceObservation` | Single request observation (all pipeline phases) |
|
||||
| `PerformanceScenarioConfig` | Scenario definition with SLO thresholds |
|
||||
| `PerformanceCurvePoint` | Aggregated window data with computed RPS/error rate |
|
||||
| `PerformanceTestSummary` | Complete test run result with threshold violations |
|
||||
| `GatewayPerformanceMetrics` | OTel service emitting Prometheus-compatible metrics |
|
||||
|
||||
---
|
||||
|
||||
## 14) References
|
||||
|
||||
- Router Architecture: `docs/modules/router/architecture.md`
|
||||
|
||||
Reference in New Issue
Block a user