consolidation of some of the modules, localization fixes, product advisories work, qa work

This commit is contained in:
master
2026-03-05 03:54:22 +02:00
parent 7bafcc3eef
commit 8e1cb9448d
3878 changed files with 72600 additions and 46861 deletions

View File

@@ -0,0 +1,49 @@
# Gateway
**Status:** Implemented
**Source:** `src/Gateway/`
**Owner:** Platform Team
## Purpose
Gateway provides API routing, authentication enforcement, and transport abstraction for StellaOps services. Acts as the single entry point for external clients with support for HTTP/HTTPS and transport-agnostic messaging via Router module.
## Components
**Services:**
- `StellaOps.Gateway.WebService` - API gateway with routing, middleware, and security
**Key Features:**
- Route configuration and service discovery
- Authorization middleware (Authority integration)
- Request/response transformation
- Rate limiting and throttling
- Transport abstraction (HTTP, TCP/TLS, UDP, RabbitMQ, Valkey)
## Configuration
See `etc/policy-gateway.yaml.sample` for gateway configuration examples.
Key settings:
- Service route mappings
- Authority issuer and audience configuration
- Transport protocols and endpoints
- Security policies and CORS settings
- Rate limiting rules
## Dependencies
- Authority (authentication and authorization)
- Router (transport-agnostic messaging)
- All backend services (routing targets)
## Related Documentation
- Architecture: `./architecture.md`
- Router Module: `../router/`
- Authority Module: `../authority/`
- API Reference: `../../API_CLI_REFERENCE.md`
## Current Status
Implemented with HTTP/HTTPS support. Integrated with Authority for token validation and authorization. Supports service routing and middleware composition.

View File

@@ -0,0 +1,568 @@
# component_architecture_gateway.md — **Stella Ops Gateway** (Sprint 3600)
> Derived from Reference Architecture Advisory and Router Architecture Specification
> **Dual-location clarification (updated 2026-02-22).** Both `src/Gateway/` and `src/Router/` contain a project named `StellaOps.Gateway.WebService`. They are **different implementations** serving complementary roles:
> - **`src/Gateway/`** (this module) — the simplified HTTP ingress gateway focused on authentication, routing to microservices via binary protocol, and OpenAPI aggregation.
> - **`src/Router/`** — the evolved "Front Door" gateway with advanced features: configurable route tables (`GatewayRouteCatalog`), reverse proxy, SPA hosting, WebSocket support, Valkey messaging transport, and extended Authority integration.
>
> The Router version (`src/Router/`) appears to be the current canonical deployment target. This Gateway version may represent a simplified or legacy configuration. Operators should verify which is deployed in their environment. See also [Router Architecture](../router/architecture.md).
> **Scope.** The Gateway WebService is the single HTTP ingress point for all external traffic. It authenticates requests via Authority (DPoP/mTLS), routes to microservices via the Router binary protocol, aggregates OpenAPI specifications, and enforces tenant isolation.
> **Ownership:** Platform Guild
---
## 0) Mission & Boundaries
### What Gateway Does
- **HTTP Ingress**: Single entry point for all external HTTP/HTTPS traffic
- **Authentication**: DPoP and mTLS token validation via Authority integration
- **Routing**: Routes HTTP requests to microservices via binary protocol (TCP/TLS)
- **OpenAPI Aggregation**: Combines endpoint specs from all registered microservices
- **Health Aggregation**: Provides unified health status from downstream services
- **Rate Limiting**: Per-tenant and per-identity request throttling
- **Tenant Propagation**: Extracts tenant context and propagates to microservices
### What Gateway Does NOT Do
- **Business Logic**: No domain logic; pure routing and auth
- **Data Storage**: Stateless; no persistent state beyond connection cache
- **Direct Database Access**: Never connects to PostgreSQL directly
- **SBOM/VEX Processing**: Delegates to Scanner, Excititor, etc.
---
## 1) Solution & Project Layout
```
src/Gateway/
├── StellaOps.Gateway.WebService/
│ ├── StellaOps.Gateway.WebService.csproj
│ ├── Program.cs # DI bootstrap, transport init
│ ├── Dockerfile
│ ├── appsettings.json
│ ├── appsettings.Development.json
│ ├── Configuration/
│ │ ├── GatewayOptions.cs # All configuration options
│ │ └── TransportOptions.cs # TCP/TLS transport config
│ ├── Middleware/
│ │ ├── TenantMiddleware.cs # Tenant context extraction
│ │ ├── RequestRoutingMiddleware.cs # HTTP → binary routing
│ │ ├── SenderConstraintMiddleware.cs # DPoP/mTLS validation
│ │ ├── IdentityHeaderPolicyMiddleware.cs # Identity header sanitization
│ │ ├── CorrelationIdMiddleware.cs # Request correlation
│ │ └── HealthCheckMiddleware.cs # Health probe handling
│ ├── Services/
│ │ ├── GatewayHostedService.cs # Transport lifecycle
│ │ ├── OpenApiAggregationService.cs # Spec aggregation
│ │ └── HealthAggregationService.cs # Downstream health
│ └── Endpoints/
│ ├── HealthEndpoints.cs # /health/*, /metrics
│ └── OpenApiEndpoints.cs # /openapi.json, /openapi.yaml
```
### Dependencies
```xml
<ItemGroup>
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Gateway\..." />
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Transport.Tcp\..." />
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Transport.Tls\..." />
<ProjectReference Include="..\..\Auth\StellaOps.Auth.ServerIntegration\..." />
</ItemGroup>
```
---
## 2) External Dependencies
| Dependency | Purpose | Required |
|------------|---------|----------|
| **Authority** | OpTok validation, DPoP/mTLS | Yes |
| **Router.Gateway** | Routing state, endpoint discovery | Yes |
| **Router.Transport.Tcp** | Binary transport (dev) | Yes |
| **Router.Transport.Tls** | Binary transport (prod) | Yes |
| **Valkey/Redis** | Rate limiting state | Optional |
---
## 3) Contracts & Data Model
### Request Flow
```
┌──────────────┐ HTTPS ┌─────────────────┐ Binary ┌─────────────────┐
│ Client │ ─────────────► │ Gateway │ ────────────► │ Microservice │
│ (CLI/UI) │ │ WebService │ Frame │ (Scanner, │
│ │ ◄───────────── │ │ ◄──────────── │ Policy, etc) │
└──────────────┘ HTTPS └─────────────────┘ Binary └─────────────────┘
```
### Binary Frame Protocol
Gateway uses the Router binary protocol for internal communication:
| Frame Type | Purpose |
|------------|---------|
| HELLO | Microservice registration with endpoints |
| HEARTBEAT | Health check and latency measurement |
| REQUEST | HTTP request serialized to binary |
| RESPONSE | HTTP response serialized from binary |
| STREAM_DATA | Streaming response chunks |
| CANCEL | Request cancellation propagation |
### Endpoint Descriptor
```csharp
public sealed class EndpointDescriptor
{
public required string Method { get; init; } // GET, POST, etc.
public required string Path { get; init; } // /api/v1/scans/{id}
public required string ServiceName { get; init; } // scanner
public required string Version { get; init; } // 1.0.0
public TimeSpan DefaultTimeout { get; init; } // 30s
public bool SupportsStreaming { get; init; } // true for large responses
public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; }
}
```
### Routing State
```csharp
public interface IRoutingStateManager
{
ValueTask RegisterEndpointsAsync(ConnectionState conn, HelloPayload hello);
ValueTask<InstanceSelection?> SelectInstanceAsync(string method, string path);
ValueTask UpdateHealthAsync(ConnectionState conn, HeartbeatPayload heartbeat);
ValueTask DrainConnectionAsync(string connectionId);
}
```
---
## 4) REST API
Gateway exposes minimal management endpoints; all business APIs are routed to microservices.
### Health Endpoints
| Endpoint | Auth | Description |
|----------|------|-------------|
| `GET /health/live` | None | Liveness probe |
| `GET /health/ready` | None | Readiness probe |
| `GET /health/startup` | None | Startup probe |
| `GET /metrics` | None | Prometheus metrics |
### OpenAPI Endpoints
| Endpoint | Auth | Description |
|----------|------|-------------|
| `GET /openapi.json` | None | Aggregated OpenAPI 3.1.0 spec |
| `GET /openapi.yaml` | None | YAML format spec |
---
## 5) Execution Flow
### Request Routing
```mermaid
sequenceDiagram
participant C as Client
participant G as Gateway
participant A as Authority
participant M as Microservice
C->>G: HTTPS Request + DPoP Token
G->>A: Validate Token
A-->>G: Claims (sub, tid, scope)
G->>G: Select Instance (Method, Path)
G->>M: Binary REQUEST Frame
M-->>G: Binary RESPONSE Frame
G-->>C: HTTPS Response
```
### Microservice Registration
```mermaid
sequenceDiagram
participant M as Microservice
participant G as Gateway
M->>G: TCP/TLS Connect
M->>G: HELLO (ServiceName, Version, Endpoints)
G->>G: Register Endpoints
G-->>M: HELLO ACK
loop Every 10s
G->>M: HEARTBEAT
M-->>G: HEARTBEAT (latency, health)
G->>G: Update Health State
end
```
---
## 6) Instance Selection Algorithm
```csharp
public ValueTask<InstanceSelection?> SelectInstanceAsync(string method, string path)
{
// 1. Find all endpoints matching (method, path)
var candidates = _endpoints
.Where(e => e.Method == method && MatchPath(e.Path, path))
.ToList();
// 2. Filter by health
candidates = candidates
.Where(c => c.Health is InstanceHealthStatus.Healthy or InstanceHealthStatus.Degraded)
.ToList();
// 3. Region preference
var localRegion = candidates.Where(c => c.Region == _config.Region).ToList();
var neighborRegions = candidates.Where(c => _config.NeighborRegions.Contains(c.Region)).ToList();
var otherRegions = candidates.Except(localRegion).Except(neighborRegions).ToList();
var preferred = localRegion.Any() ? localRegion
: neighborRegions.Any() ? neighborRegions
: otherRegions;
// 4. Within tier: prefer lower latency, then most recent heartbeat
return preferred
.OrderBy(c => c.AveragePingMs)
.ThenByDescending(c => c.LastHeartbeatUtc)
.FirstOrDefault();
}
```
---
## 7) Configuration
```yaml
gateway:
node:
region: "eu1"
nodeId: "gw-eu1-01"
environment: "prod"
transports:
tcp:
enabled: true
port: 9100
maxConnections: 1000
receiveBufferSize: 65536
sendBufferSize: 65536
tls:
enabled: true
port: 9443
certificatePath: "/certs/gateway.pfx"
certificatePassword: "${GATEWAY_CERT_PASSWORD}"
clientCertificateMode: "RequireCertificate"
allowedClientCertificateThumbprints: []
routing:
defaultTimeout: "30s"
maxRequestBodySize: "100MB"
streamingEnabled: true
streamingBufferSize: 16384
neighborRegions: ["eu2", "us1"]
auth:
dpopEnabled: true
dpopMaxClockSkew: "60s"
mtlsEnabled: true
rateLimiting:
enabled: true
requestsPerMinute: 1000
burstSize: 100
redisConnectionString: "${REDIS_URL}" # Valkey (Redis-compatible)
openapi:
enabled: true
cacheTtlSeconds: 300
title: "Stella Ops API"
version: "1.0.0"
health:
heartbeatIntervalSeconds: 10
heartbeatTimeoutSeconds: 30
unhealthyThreshold: 3
```
---
## 8) Scale & Performance
| Metric | Target | Notes |
|--------|--------|-------|
| Routing latency (P50) | <2ms | Overhead only; excludes downstream |
| Routing latency (P99) | <5ms | Under normal load |
| Concurrent connections | 10,000 | Per gateway instance |
| Requests/second | 50,000 | Per gateway instance |
| Memory footprint | <512MB | Base; scales with connections |
### Scaling Strategy
- Horizontal scaling behind load balancer
- Sticky sessions NOT required (stateless)
- Regional deployment for latency optimization
- Rate limiting via distributed Valkey/Redis
---
## 9) Security Posture
### Authentication
| Method | Description |
|--------|-------------|
| DPoP | Proof-of-possession tokens from Authority |
| mTLS | Certificate-bound tokens for machine clients |
### Authorization
- Claims-based authorization per endpoint
- Required claims defined in endpoint descriptors
- Tenant isolation via `tid` claim
### Transport Security
| Component | Encryption |
|-----------|------------|
| Client Gateway | TLS 1.3 (HTTPS) |
| Gateway Microservices | TLS (prod), TCP (dev only) |
### Rate Limiting
Gateway uses the Router's dual-window rate limiting middleware with circuit breaker:
- **Instance-level** (in-memory): Per-router-instance limits using sliding window counters
- High-precision sub-second buckets for fair rate distribution
- No external dependencies; always available
- **Environment-level** (Valkey-backed): Cross-instance limits for distributed deployments
- Atomic Lua scripts for consistent counting across instances
- Circuit breaker pattern for fail-open behavior when Valkey is unavailable
- **Activation gate**: Environment-level checks only activate above traffic threshold (configurable)
- **Response headers**: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After
Configuration via `appsettings.yaml`:
```yaml
rate_limiting:
process_back_pressure_when_more_than_per_5min: 5000
for_instance:
rules:
- max_requests: 100
per_seconds: 1
- max_requests: 1000
per_seconds: 60
for_environment:
valkey_connection: "localhost:6379"
rules:
- max_requests: 10000
per_seconds: 60
circuit_breaker:
failure_threshold: 3
timeout_seconds: 30
half_open_timeout: 10
```
---
## 10) Observability & Audit
### Metrics (Prometheus)
```
gateway_requests_total{service,method,path,status}
gateway_request_duration_seconds{service,method,path,quantile}
gateway_active_connections{service}
gateway_transport_frames_total{type}
gateway_auth_failures_total{reason}
gateway_rate_limit_exceeded_total{tenant}
```
### Traces (OpenTelemetry)
- Span per request: `gateway.route`
- Child span: `gateway.auth.validate`
- Child span: `gateway.transport.send`
### Logs (Structured)
```json
{
"timestamp": "2025-12-21T10:00:00Z",
"level": "info",
"message": "Request routed",
"correlationId": "abc123",
"tenantId": "tenant-1",
"method": "GET",
"path": "/api/v1/scans/xyz",
"service": "scanner",
"durationMs": 45,
"status": 200
}
```
---
## 11) Testing Matrix
| Test Type | Scope | Coverage Target |
|-----------|-------|-----------------|
| Unit | Routing algorithm, auth validation | 90% |
| Integration | Transport + routing flow | 80% |
| E2E | Full request path with mock services | Key flows |
| Performance | Latency, throughput, connection limits | SLO targets |
| Chaos | Connection failures, microservice crashes | Resilience |
### Test Fixtures
- `StellaOps.Router.Transport.InMemory` for transport mocking
- Mock Authority for auth testing
- `WebApplicationFactory` for integration tests
---
## 12) DevOps & Operations
### Deployment
```yaml
# Kubernetes deployment excerpt
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway
spec:
replicas: 3
template:
spec:
containers:
- name: gateway
image: stellaops/gateway:1.0.0
ports:
- containerPort: 8080 # HTTPS
- containerPort: 9443 # TLS (microservices)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health/live
port: 8080
readinessProbe:
httpGet:
path: /health/ready
port: 8080
```
### SLOs
| SLO | Target | Measurement |
|-----|--------|-------------|
| Availability | 99.9% | Uptime over 30 days |
| Latency P99 | <50ms | Includes downstream |
| Error rate | <0.1% | 5xx responses |
---
## 13) Roadmap
| Feature | Sprint | Status |
|---------|--------|--------|
| Core implementation | 3600.0001.0001 | TODO |
| Performance Testing Pipeline | 038 | DONE |
| WebSocket support | Future | Planned |
| gRPC passthrough | Future | Planned |
| GraphQL aggregation | Future | Exploration |
---
## 14) Performance Testing Pipeline (k6 + Prometheus + Correlation IDs)
### Overview
The Gateway includes a comprehensive performance testing pipeline with k6 load tests,
Prometheus metric instrumentation, and Grafana dashboards for performance curve modelling.
### k6 Scenarios (AG)
| Scenario | Purpose | VUs | Duration | Key Metric |
|----------|---------|-----|----------|------------|
| A Health Baseline | Sub-ms health probe overhead | 10 | 1 min | P95 < 10 ms |
| B OpenAPI Aggregation | Spec cache under concurrent readers | 50 | 75 s | P95 < 200 ms |
| C Routing Throughput | Mixed-method routing at target RPS | 200 | 2 min | P50 < 2 ms, P99 < 5 ms |
| D Correlation ID | Propagation overhead measurement | 20 | 1 min | P95 < 5 ms overhead |
| E Rate Limit Boundary | Enforcement correctness at boundary | 100 | 1 min | Retry-After header |
| F Connection Ramp | Transport saturation (ramp to 1000 VUs) | 1000 | 2 min | No 503 responses |
| G Steady-State Soak | Memory leak / resource exhaustion | 50 | 10 min | Stable memory |
Run all scenarios:
```bash
k6 run --env BASE_URL=https://gateway.stella-ops.local src/Gateway/__Tests/load/gateway_performance.k6.js
```
Run a single scenario:
```bash
k6 run --env BASE_URL=https://gateway.stella-ops.local --env SCENARIO=scenario_c_routing_throughput src/Gateway/__Tests/load/gateway_performance.k6.js
```
### Performance Metrics (GatewayPerformanceMetrics)
Meter: `StellaOps.Gateway.Performance`
| Instrument | Type | Unit | Description |
|------------|------|------|-------------|
| `gateway.requests.total` | Counter | | Total requests processed |
| `gateway.errors.total` | Counter | | Errors (4xx/5xx) |
| `gateway.ratelimit.total` | Counter | | Rate-limited requests (429) |
| `gateway.request.duration` | Histogram | ms | Full request duration |
| `gateway.auth.duration` | Histogram | ms | Auth middleware duration |
| `gateway.transport.duration` | Histogram | ms | TCP/TLS transport duration |
| `gateway.routing.duration` | Histogram | ms | Instance selection duration |
### Grafana Dashboard
Dashboard: `devops/telemetry/dashboards/stella-ops-gateway-performance.json`
UID: `stella-ops-gateway-performance`
Panels:
1. **Overview row** P50/P99 gauges, error rate, RPS
2. **Latency Distribution** Percentile time series (overall + per-service)
3. **Throughput & Rate Limiting** RPS by service, rate-limited requests by route
4. **Pipeline Breakdown** Auth/Routing/Transport P95 breakdown, errors by status
5. **Connections & Resources** Active connections, endpoints, memory usage
### C# Models
| Type | Purpose |
|------|---------|
| `GatewayPerformanceObservation` | Single request observation (all pipeline phases) |
| `PerformanceScenarioConfig` | Scenario definition with SLO thresholds |
| `PerformanceCurvePoint` | Aggregated window data with computed RPS/error rate |
| `PerformanceTestSummary` | Complete test run result with threshold violations |
| `GatewayPerformanceMetrics` | OTel service emitting Prometheus-compatible metrics |
---
## 14) References
- Router Architecture: `docs/modules/router/architecture.md`
- Gateway Identity Header Policy: `docs/modules/gateway/identity-header-policy.md`
- OpenAPI Aggregation: `docs/modules/gateway/openapi.md`
- Router ASP.NET Endpoint Bridge: `docs/modules/router/aspnet-endpoint-bridge.md`
- Router Messaging (Valkey) Transport: `docs/modules/router/messaging-valkey-transport.md`
- Authority Integration: `docs/modules/authority/architecture.md`
- Reference Architecture: `docs/product/advisories/archived/2025-12-21-reference-architecture/`
---
**Last Updated**: 2025-12-21 (Sprint 3600)

View File

@@ -0,0 +1,129 @@
# Gateway · Identity Header Policy for Router Dispatch
## Status
- **Implemented** in Sprint 8100.0011.0002.
- Middleware: `src/Gateway/StellaOps.Gateway.WebService/Middleware/IdentityHeaderPolicyMiddleware.cs`
- Last updated: 2025-12-24 (UTC).
## Why This Exists
The Gateway is the single HTTP ingress point and routes requests to internal microservices over Router transports. Many services (and legacy components) still rely on **header-based identity context** (tenant/scopes/actor) rather than (or in addition to) `HttpContext.User` claims.
This creates a hard security requirement:
- **Clients must never be able to inject/override “roles/scopes” headers** that the downstream service trusts.
- The Gateway must derive the effective identity from the validated JWT/JWK token (or explicit anonymous identity) and **overwrite** downstream identity headers accordingly.
## Implementation
The `IdentityHeaderPolicyMiddleware` (introduced in Sprint 8100.0011.0002) replaces the legacy middleware:
- ~~`src/Gateway/StellaOps.Gateway.WebService/Middleware/ClaimsPropagationMiddleware.cs`~~ (retired)
- ~~`src/Gateway/StellaOps.Gateway.WebService/Middleware/TenantMiddleware.cs`~~ (retired)
### Resolved issues
1) **Spoofing risk:** ✅ Fixed. Middleware uses "strip-and-overwrite" semantics—reserved headers are stripped before claims are extracted and downstream headers are written.
2) **Claim type mismatch:** ✅ Fixed. Middleware uses `StellaOpsClaimTypes.Tenant` (`stellaops:tenant`) with fallback to legacy `tid` claim.
3) **Scope claim mismatch:** ✅ Fixed. Middleware extracts scopes from both `scp` (individual claims) and `scope` (space-separated) claims.
4) **Docs alignment:** ✅ Reconciled in this sprint.
## Policy Goals
- **No client-controlled identity headers:** the Gateway rejects or strips identity headers coming from external clients.
- **Gateway-controlled propagation:** the Gateway sets downstream identity headers based on validated token claims or a defined anonymous identity.
- **Compatibility bridge:** support both `X-Stella-*` and `X-StellaOps-*` header naming during migration.
- **Determinism:** header values are canonicalized (whitespace, ordering) and do not vary across equivalent requests.
## Reserved Headers (Draft)
The following headers are considered **reserved identity context** and must not be trusted from external clients:
- Tenant / project:
- `X-StellaOps-Tenant`, `X-Stella-Tenant`
- `X-StellaOps-Project`, `X-Stella-Project`
- Scopes / roles:
- `X-StellaOps-Scopes`, `X-Stella-Scopes`
- Actor / subject (if used for auditing):
- `X-StellaOps-Actor`, `X-Stella-Actor`
- Token proof / confirmation (if propagated):
- `cnf`, `cnf.jkt`
**Internal/legacy pass-through keys to also treat as reserved:**
- `sub`, `scope`, `scp`, `tid` (legacy), `stellaops:tenant` (if ever used as a header key)
## Overwrite Rules (Draft)
For non-system paths (i.e., requests that will be routed to microservices):
1) **Strip** all reserved identity headers from the incoming request.
2) **Compute** effective identity from the authenticated principal:
- `sub` from JWT `sub` (`StellaOpsClaimTypes.Subject`)
- `tenant` from `stellaops:tenant` (`StellaOpsClaimTypes.Tenant`)
- `project` from `stellaops:project` (`StellaOpsClaimTypes.Project`) when present
- `scopes` from:
- `scp` claims (`StellaOpsClaimTypes.ScopeItem`) if present, else
- split `scope` (`StellaOpsClaimTypes.Scope`) by spaces
3) **Write** downstream headers (compat mode):
- Tenant:
- `X-StellaOps-Tenant: <tenant>`
- `X-Stella-Tenant: <tenant>` (optional during migration)
- Project (optional):
- `X-StellaOps-Project: <project>`
- `X-Stella-Project: <project>` (optional during migration)
- Scopes:
- `X-StellaOps-Scopes: <space-delimited scopes>`
- `X-Stella-Scopes: <space-delimited scopes>` (optional during migration)
- Actor:
- `X-StellaOps-Actor: <sub>` (unless another canonical actor claim is defined)
### Anonymous mode
If `Gateway:Auth:AllowAnonymous=true` and the request is unauthenticated:
- Set an explicit anonymous identity so downstream services never interpret “missing header” as privileged:
- `X-StellaOps-Actor: anonymous`
- `X-StellaOps-Scopes: ` (empty) or `anonymous` (choose one and document)
- Tenant behavior must be explicitly defined:
- either reject routed requests without tenant context, or
- require a tenant header even in anonymous mode and treat it as *untrusted input* that only selects a tenancy partition (not privileges).
## Scope Override Header (Offline/Pre-prod)
Some legacy flows allow setting scopes via headers (for offline kits or pre-prod bundles).
Draft enforcement:
- Default: **forbid** client-provided scope headers (`X-StellaOps-Scopes`, `X-Stella-Scopes`) and return 403 (deterministic error code).
- Optional controlled override: allow only when `Gateway:Auth:AllowScopeHeader=true`, and only for explicitly allowed environments (offline/pre-prod).
- Even when allowed, the override must not silently escalate a request beyond what the token allows unless the request is explicitly unauthenticated and the environment is configured for offline operation.
## Implementation Details
### Middleware Registration
The middleware is registered in `Program.cs` after authentication:
```csharp
app.UseAuthentication();
app.UseMiddleware<SenderConstraintMiddleware>();
app.UseMiddleware<IdentityHeaderPolicyMiddleware>();
```
### Configuration
Options are configured via `GatewayOptions.Auth`:
```yaml
Gateway:
Auth:
EnableLegacyHeaders: true # Write X-Stella-* in addition to X-StellaOps-*
AllowScopeHeader: false # Forbid client scope headers (default)
```
### HttpContext.Items Keys
The middleware stores normalized identity in `HttpContext.Items` using `GatewayContextKeys`:
- `Gateway.TenantId` — extracted tenant identifier
- `Gateway.ProjectId` — extracted project identifier (optional)
- `Gateway.Actor` — subject/actor from claims or "anonymous"
- `Gateway.Scopes``HashSet<string>` of scopes
- `Gateway.IsAnonymous``bool` indicating anonymous request
- `Gateway.DpopThumbprint` — JKT from cnf claim (if present)
- `Gateway.CnfJson` — raw cnf claim JSON (if present)
### Tests
Located in `src/Gateway/__Tests/StellaOps.Gateway.WebService.Tests/Middleware/IdentityHeaderPolicyMiddlewareTests.cs`:
- ✅ Spoofed identity headers are stripped and overwritten
- ✅ Claim type mapping uses `StellaOpsClaimTypes.*` correctly
- ✅ Anonymous requests receive explicit `anonymous` identity
- ✅ Legacy headers are written when `EnableLegacyHeaders=true`
- ✅ Scopes are sorted deterministically
## Related Documents
- Gateway architecture: `docs/modules/gateway/architecture.md`
- Tenant auth contract (Web V): `docs/api/gateway/tenant-auth.md`
- Router ASP.NET bridge: `docs/modules/router/aspnet-endpoint-bridge.md`

View File

@@ -0,0 +1,344 @@
# Gateway OpenAPI Implementation
This document describes the implementation architecture of OpenAPI document aggregation in the StellaOps Router Gateway.
## Architecture
The Gateway generates OpenAPI 3.1.0 documentation by aggregating schemas and endpoint metadata from connected microservices.
### Component Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ Gateway │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ ConnectionManager │───►│ InMemoryRoutingState│ │
│ │ │ │ │ │
│ │ - OnHelloReceived │ │ - Connections[] │ │
│ │ - OnConnClosed │ │ - Endpoints │ │
│ └──────────────────┘ │ - Schemas │ │
│ │ └─────────┬──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ OpenApiDocument │◄───│ GatewayOpenApi │ │
│ │ Cache │ │ DocumentCache │ │
│ │ │ │ │ │
│ │ - Invalidate() │ │ - TTL expiration │ │
│ └──────────────────┘ │ - ETag generation │ │
│ └─────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────┐ │
│ │ OpenApiDocument │ │
│ │ Generator │ │
│ │ │ │
│ │ - GenerateInfo() │ │
│ │ - GeneratePaths() │ │
│ │ - GenerateTags() │ │
│ │ - GenerateSchemas()│ │
│ └─────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────┐ │
│ │ ClaimSecurity │ │
│ │ Mapper │ │
│ │ │ │
│ │ - SecuritySchemes │ │
│ │ - SecurityRequire │ │
│ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
### Components
| Component | File | Responsibility |
|-----------|------|----------------|
| `IOpenApiDocumentGenerator` | `OpenApi/IOpenApiDocumentGenerator.cs` | Interface for document generation |
| `OpenApiDocumentGenerator` | `OpenApi/OpenApiDocumentGenerator.cs` | Builds OpenAPI 3.1.0 JSON |
| `IGatewayOpenApiDocumentCache` | `OpenApi/IGatewayOpenApiDocumentCache.cs` | Interface for document caching |
| `GatewayOpenApiDocumentCache` | `OpenApi/GatewayOpenApiDocumentCache.cs` | TTL + invalidation caching |
| `ClaimSecurityMapper` | `OpenApi/ClaimSecurityMapper.cs` | Maps claims to OAuth2 scopes |
| `OpenApiEndpoints` | `OpenApi/OpenApiEndpoints.cs` | HTTP endpoint handlers |
| `OpenApiAggregationOptions` | `OpenApi/OpenApiAggregationOptions.cs` | Configuration options |
---
## OpenApiDocumentGenerator
Generates the complete OpenAPI 3.1.0 document from routing state.
### Process Flow
1. **Collect connections** from `IGlobalRoutingState`
2. **Generate info** section from `OpenApiAggregationOptions`
3. **Generate paths** by iterating all endpoints across connections
4. **Generate components** including schemas and security schemes
5. **Generate tags** from unique service names
### Schema Handling
Schemas are prefixed with service name to avoid naming conflicts:
```csharp
var prefixedId = $"{conn.Instance.ServiceName}_{schemaId}";
// billing_CreateInvoiceRequest
```
### Operation ID Generation
Operation IDs follow a consistent pattern:
```csharp
var operationId = $"{serviceName}_{path}_{method}";
// billing_invoices_POST
```
---
## GatewayOpenApiDocumentCache
Implements caching with TTL expiration and content-based ETags.
### Cache Behavior
| Trigger | Action |
|---------|--------|
| First request | Generate and cache document |
| Subsequent requests (within TTL) | Return cached document |
| TTL expired | Regenerate document |
| Connection added/removed | Invalidate cache |
### ETag Generation
ETags are computed from SHA256 hash of document content:
```csharp
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(documentJson));
var etag = $"\"{Convert.ToHexString(hash)[..16]}\"";
```
### Thread Safety
The cache uses locking to ensure thread-safe regeneration:
```csharp
lock (_lock)
{
if (_cachedDocument is null || IsExpired())
{
RegenerateDocument();
}
}
```
---
## ClaimSecurityMapper
Maps endpoint claim requirements to OpenAPI security schemes.
### Security Scheme Generation
Always generates `BearerAuth` scheme. Generates `OAuth2` scheme only when endpoints have claim requirements:
```csharp
public static JsonObject GenerateSecuritySchemes(
IEnumerable<EndpointDescriptor> endpoints,
string tokenUrl)
{
var schemes = new JsonObject();
// Always add BearerAuth
schemes["BearerAuth"] = new JsonObject { ... };
// Collect scopes from all endpoints
var scopes = CollectScopes(endpoints);
// Add OAuth2 only if scopes exist
if (scopes.Count > 0)
{
schemes["OAuth2"] = GenerateOAuth2Scheme(tokenUrl, scopes);
}
return schemes;
}
```
### Per-Operation Security
Each endpoint with claims gets a security requirement:
```csharp
public static JsonArray GenerateSecurityRequirement(EndpointDescriptor endpoint)
{
if (endpoint.AllowAnonymous)
return new JsonArray(); // Anonymous endpoint
if (!endpoint.RequiresAuthentication && endpoint.RequiringClaims.Count == 0)
return new JsonArray(); // No auth semantics published
return new JsonArray
{
new JsonObject
{
["BearerAuth"] = new JsonArray(),
["OAuth2"] = new JsonArray(scopes.Select(scope => scope))
}
};
}
```
### Router-specific OpenAPI extensions
Gateway now emits Router-specific extensions on each operation:
- `x-stellaops-gateway-auth`: effective authorization semantics projected from endpoint metadata.
- `allowAnonymous`
- `requiresAuthentication`
- `source` (`None`, `AspNetMetadata`, `YamlOverride`, `Hybrid`)
- optional `policies`, `roles`, `claimRequirements`
- `x-stellaops-timeout`: timeout semantics used by gateway dispatch.
- `effectiveSeconds`
- `source` (`endpoint`, `gatewayRouteDefault`, and capped variants)
- `endpointSeconds`, `gatewayRouteDefaultSeconds`, `gatewayGlobalCapSeconds` when available
- precedence list: endpoint override -> service default -> gateway route default -> gateway global cap
- `x-stellaops-timeout-seconds`: backward-compatible scalar alias for `effectiveSeconds`.
---
## Configuration Reference
### OpenApiAggregationOptions
| Property | Type | Default | Description |
|----------|------|---------|-------------|
| `Title` | `string` | `"StellaOps Gateway API"` | API title |
| `Description` | `string` | `"Unified API..."` | API description |
| `Version` | `string` | `"1.0.0"` | API version |
| `ServerUrl` | `string` | `"/"` | Base server URL |
| `CacheTtlSeconds` | `int` | `60` | Cache TTL |
| `Enabled` | `bool` | `true` | Enable/disable |
| `LicenseName` | `string` | `"BUSL-1.1"` | License name |
| `ContactName` | `string?` | `null` | Contact name |
| `ContactEmail` | `string?` | `null` | Contact email |
| `TokenUrl` | `string` | `"/auth/token"` | OAuth2 token URL |
### YAML Configuration
```yaml
OpenApi:
Title: "My Gateway API"
Description: "Unified API for all microservices"
Version: "2.0.0"
ServerUrl: "https://api.example.com"
CacheTtlSeconds: 60
Enabled: true
LicenseName: "BUSL-1.1"
ContactName: "API Team"
ContactEmail: "api@example.com"
TokenUrl: "/auth/token"
```
---
## Service Registration
Services are registered via dependency injection in `ServiceCollectionExtensions`:
```csharp
services.Configure<OpenApiAggregationOptions>(
configuration.GetSection("OpenApi"));
services.AddSingleton<IOpenApiDocumentGenerator, OpenApiDocumentGenerator>();
services.AddSingleton<IGatewayOpenApiDocumentCache, GatewayOpenApiDocumentCache>();
```
Endpoints are mapped in `ApplicationBuilderExtensions`:
```csharp
app.MapGatewayOpenApiEndpoints();
```
---
## Cache Invalidation
The `ConnectionManager` invalidates the cache on connection changes:
```csharp
private Task HandleHelloReceivedAsync(ConnectionState state, HelloPayload payload)
{
_routingState.AddConnection(state);
_openApiCache?.Invalidate(); // Invalidate on new connection
return Task.CompletedTask;
}
private Task HandleConnectionClosedAsync(string connectionId)
{
_routingState.RemoveConnection(connectionId);
_openApiCache?.Invalidate(); // Invalidate on disconnect
return Task.CompletedTask;
}
```
---
## Extension Points
### Custom Routing Plugins
The Gateway supports custom routing plugins via `IRoutingPlugin`. While not directly related to OpenAPI, routing decisions can affect which endpoints are exposed.
### Future Enhancements
Potential extension points for future development:
- **Schema Transformers**: Modify schemas before aggregation
- **Tag Customization**: Custom tag generation logic
- **Response Examples**: Include example responses from connected services
- **Webhooks**: Notify external systems on document changes
---
## Testing
Unit tests are located in `src/Gateway/__Tests/StellaOps.Gateway.WebService.Tests/OpenApi/`:
| Test File | Coverage |
|-----------|----------|
| `OpenApiDocumentGeneratorTests.cs` | Document structure, schema merging, tag generation |
| `GatewayOpenApiDocumentCacheTests.cs` | TTL expiry, invalidation, ETag consistency |
| `ClaimSecurityMapperTests.cs` | Security scheme generation from claims |
### Test Patterns
```csharp
[Fact]
public void GenerateDocument_WithConnections_GeneratesPaths()
{
// Arrange
var endpoint = new EndpointDescriptor { ... };
var connection = CreateConnection("inventory", "1.0.0", endpoint);
_routingState.Setup(x => x.GetAllConnections()).Returns([connection]);
// Act
var document = _sut.GenerateDocument();
// Assert
var doc = JsonDocument.Parse(document);
doc.RootElement.GetProperty("paths")
.TryGetProperty("/api/items", out _)
.Should().BeTrue();
}
```
---
## See Also
- [Schema Validation](../router/schema-validation.md) - JSON Schema validation in microservices
- [OpenAPI Aggregation](../router/openapi-aggregation.md) - Configuration and usage guide
- [API Overview](../../api/overview.md) - General API conventions