partly or unimplemented features - now implemented

This commit is contained in:
master
2026-02-09 08:53:51 +02:00
parent 1bf6bbf395
commit 4bdc298ec1
674 changed files with 90194 additions and 2271 deletions

View File

@@ -44,8 +44,10 @@ src/Gateway/
│ ├── Middleware/
│ │ ├── TenantMiddleware.cs # Tenant context extraction
│ │ ├── RequestRoutingMiddleware.cs # HTTP → binary routing
│ │ ├── AuthenticationMiddleware.cs # DPoP/mTLS validation
│ │ ── RateLimitingMiddleware.cs # Per-tenant throttling
│ │ ├── SenderConstraintMiddleware.cs # DPoP/mTLS validation
│ │ ── IdentityHeaderPolicyMiddleware.cs # Identity header sanitization
│ │ ├── CorrelationIdMiddleware.cs # Request correlation
│ │ └── HealthCheckMiddleware.cs # Health probe handling
│ ├── Services/
│ │ ├── GatewayHostedService.cs # Transport lifecycle
│ │ ├── OpenApiAggregationService.cs # Spec aggregation
@@ -329,9 +331,37 @@ gateway:
### Rate Limiting
- Per-tenant: Configurable requests/minute
- Per-identity: Burst protection
- Global: Circuit breaker for overload
Gateway uses the Router's dual-window rate limiting middleware with circuit breaker:
- **Instance-level** (in-memory): Per-router-instance limits using sliding window counters
- High-precision sub-second buckets for fair rate distribution
- No external dependencies; always available
- **Environment-level** (Valkey-backed): Cross-instance limits for distributed deployments
- Atomic Lua scripts for consistent counting across instances
- Circuit breaker pattern for fail-open behavior when Valkey is unavailable
- **Activation gate**: Environment-level checks only activate above traffic threshold (configurable)
- **Response headers**: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After
Configuration via `appsettings.yaml`:
```yaml
rate_limiting:
process_back_pressure_when_more_than_per_5min: 5000
for_instance:
rules:
- max_requests: 100
per_seconds: 1
- max_requests: 1000
per_seconds: 60
for_environment:
valkey_connection: "localhost:6379"
rules:
- max_requests: 10000
per_seconds: 60
circuit_breaker:
failure_threshold: 3
timeout_seconds: 30
half_open_timeout: 10
```
---
@@ -443,12 +473,80 @@ spec:
| Feature | Sprint | Status |
|---------|--------|--------|
| Core implementation | 3600.0001.0001 | TODO |
| Performance Testing Pipeline | 038 | DONE |
| WebSocket support | Future | Planned |
| gRPC passthrough | Future | Planned |
| GraphQL aggregation | Future | Exploration |
---
## 14) Performance Testing Pipeline (k6 + Prometheus + Correlation IDs)
### Overview
The Gateway includes a comprehensive performance testing pipeline with k6 load tests,
Prometheus metric instrumentation, and Grafana dashboards for performance curve modelling.
### k6 Scenarios (AG)
| Scenario | Purpose | VUs | Duration | Key Metric |
|----------|---------|-----|----------|------------|
| A Health Baseline | Sub-ms health probe overhead | 10 | 1 min | P95 < 10 ms |
| B OpenAPI Aggregation | Spec cache under concurrent readers | 50 | 75 s | P95 < 200 ms |
| C Routing Throughput | Mixed-method routing at target RPS | 200 | 2 min | P50 < 2 ms, P99 < 5 ms |
| D Correlation ID | Propagation overhead measurement | 20 | 1 min | P95 < 5 ms overhead |
| E Rate Limit Boundary | Enforcement correctness at boundary | 100 | 1 min | Retry-After header |
| F Connection Ramp | Transport saturation (ramp to 1000 VUs) | 1000 | 2 min | No 503 responses |
| G Steady-State Soak | Memory leak / resource exhaustion | 50 | 10 min | Stable memory |
Run all scenarios:
```bash
k6 run --env BASE_URL=https://gateway.stella-ops.local src/Gateway/__Tests/load/gateway_performance.k6.js
```
Run a single scenario:
```bash
k6 run --env BASE_URL=https://gateway.stella-ops.local --env SCENARIO=scenario_c_routing_throughput src/Gateway/__Tests/load/gateway_performance.k6.js
```
### Performance Metrics (GatewayPerformanceMetrics)
Meter: `StellaOps.Gateway.Performance`
| Instrument | Type | Unit | Description |
|------------|------|------|-------------|
| `gateway.requests.total` | Counter | | Total requests processed |
| `gateway.errors.total` | Counter | | Errors (4xx/5xx) |
| `gateway.ratelimit.total` | Counter | | Rate-limited requests (429) |
| `gateway.request.duration` | Histogram | ms | Full request duration |
| `gateway.auth.duration` | Histogram | ms | Auth middleware duration |
| `gateway.transport.duration` | Histogram | ms | TCP/TLS transport duration |
| `gateway.routing.duration` | Histogram | ms | Instance selection duration |
### Grafana Dashboard
Dashboard: `devops/telemetry/dashboards/stella-ops-gateway-performance.json`
UID: `stella-ops-gateway-performance`
Panels:
1. **Overview row** P50/P99 gauges, error rate, RPS
2. **Latency Distribution** Percentile time series (overall + per-service)
3. **Throughput & Rate Limiting** RPS by service, rate-limited requests by route
4. **Pipeline Breakdown** Auth/Routing/Transport P95 breakdown, errors by status
5. **Connections & Resources** Active connections, endpoints, memory usage
### C# Models
| Type | Purpose |
|------|---------|
| `GatewayPerformanceObservation` | Single request observation (all pipeline phases) |
| `PerformanceScenarioConfig` | Scenario definition with SLO thresholds |
| `PerformanceCurvePoint` | Aggregated window data with computed RPS/error rate |
| `PerformanceTestSummary` | Complete test run result with threshold violations |
| `GatewayPerformanceMetrics` | OTel service emitting Prometheus-compatible metrics |
---
## 14) References
- Router Architecture: `docs/modules/router/architecture.md`