# component_architecture_gateway.md — **Stella Ops Gateway** (Sprint 3600) > Derived from Reference Architecture Advisory and Router Architecture Specification > **Scope.** The Gateway WebService is the single HTTP ingress point for all external traffic. It authenticates requests via Authority (DPoP/mTLS), routes to microservices via the Router binary protocol, aggregates OpenAPI specifications, and enforces tenant isolation. > **Ownership:** Platform Guild --- ## 0) Mission & Boundaries ### What Gateway Does - **HTTP Ingress**: Single entry point for all external HTTP/HTTPS traffic - **Authentication**: DPoP and mTLS token validation via Authority integration - **Routing**: Routes HTTP requests to microservices via binary protocol (TCP/TLS) - **OpenAPI Aggregation**: Combines endpoint specs from all registered microservices - **Health Aggregation**: Provides unified health status from downstream services - **Rate Limiting**: Per-tenant and per-identity request throttling - **Tenant Propagation**: Extracts tenant context and propagates to microservices ### What Gateway Does NOT Do - **Business Logic**: No domain logic; pure routing and auth - **Data Storage**: Stateless; no persistent state beyond connection cache - **Direct Database Access**: Never connects to PostgreSQL directly - **SBOM/VEX Processing**: Delegates to Scanner, Excititor, etc. --- ## 1) Solution & Project Layout ``` src/Gateway/ ├── StellaOps.Gateway.WebService/ │ ├── StellaOps.Gateway.WebService.csproj │ ├── Program.cs # DI bootstrap, transport init │ ├── Dockerfile │ ├── appsettings.json │ ├── appsettings.Development.json │ ├── Configuration/ │ │ ├── GatewayOptions.cs # All configuration options │ │ └── TransportOptions.cs # TCP/TLS transport config │ ├── Middleware/ │ │ ├── TenantMiddleware.cs # Tenant context extraction │ │ ├── RequestRoutingMiddleware.cs # HTTP → binary routing │ │ ├── AuthenticationMiddleware.cs # DPoP/mTLS validation │ │ └── RateLimitingMiddleware.cs # Per-tenant throttling │ ├── Services/ │ │ ├── GatewayHostedService.cs # Transport lifecycle │ │ ├── OpenApiAggregationService.cs # Spec aggregation │ │ └── HealthAggregationService.cs # Downstream health │ └── Endpoints/ │ ├── HealthEndpoints.cs # /health/*, /metrics │ └── OpenApiEndpoints.cs # /openapi.json, /openapi.yaml ``` ### Dependencies ```xml ``` --- ## 2) External Dependencies | Dependency | Purpose | Required | |------------|---------|----------| | **Authority** | OpTok validation, DPoP/mTLS | Yes | | **Router.Gateway** | Routing state, endpoint discovery | Yes | | **Router.Transport.Tcp** | Binary transport (dev) | Yes | | **Router.Transport.Tls** | Binary transport (prod) | Yes | | **Valkey/Redis** | Rate limiting state | Optional | --- ## 3) Contracts & Data Model ### Request Flow ``` ┌──────────────┐ HTTPS ┌─────────────────┐ Binary ┌─────────────────┐ │ Client │ ─────────────► │ Gateway │ ────────────► │ Microservice │ │ (CLI/UI) │ │ WebService │ Frame │ (Scanner, │ │ │ ◄───────────── │ │ ◄──────────── │ Policy, etc) │ └──────────────┘ HTTPS └─────────────────┘ Binary └─────────────────┘ ``` ### Binary Frame Protocol Gateway uses the Router binary protocol for internal communication: | Frame Type | Purpose | |------------|---------| | HELLO | Microservice registration with endpoints | | HEARTBEAT | Health check and latency measurement | | REQUEST | HTTP request serialized to binary | | RESPONSE | HTTP response serialized from binary | | STREAM_DATA | Streaming response chunks | | CANCEL | Request cancellation propagation | ### Endpoint Descriptor ```csharp public sealed class EndpointDescriptor { public required string Method { get; init; } // GET, POST, etc. public required string Path { get; init; } // /api/v1/scans/{id} public required string ServiceName { get; init; } // scanner public required string Version { get; init; } // 1.0.0 public TimeSpan DefaultTimeout { get; init; } // 30s public bool SupportsStreaming { get; init; } // true for large responses public IReadOnlyList RequiringClaims { get; init; } } ``` ### Routing State ```csharp public interface IRoutingStateManager { ValueTask RegisterEndpointsAsync(ConnectionState conn, HelloPayload hello); ValueTask SelectInstanceAsync(string method, string path); ValueTask UpdateHealthAsync(ConnectionState conn, HeartbeatPayload heartbeat); ValueTask DrainConnectionAsync(string connectionId); } ``` --- ## 4) REST API Gateway exposes minimal management endpoints; all business APIs are routed to microservices. ### Health Endpoints | Endpoint | Auth | Description | |----------|------|-------------| | `GET /health/live` | None | Liveness probe | | `GET /health/ready` | None | Readiness probe | | `GET /health/startup` | None | Startup probe | | `GET /metrics` | None | Prometheus metrics | ### OpenAPI Endpoints | Endpoint | Auth | Description | |----------|------|-------------| | `GET /openapi.json` | None | Aggregated OpenAPI 3.1.0 spec | | `GET /openapi.yaml` | None | YAML format spec | --- ## 5) Execution Flow ### Request Routing ```mermaid sequenceDiagram participant C as Client participant G as Gateway participant A as Authority participant M as Microservice C->>G: HTTPS Request + DPoP Token G->>A: Validate Token A-->>G: Claims (sub, tid, scope) G->>G: Select Instance (Method, Path) G->>M: Binary REQUEST Frame M-->>G: Binary RESPONSE Frame G-->>C: HTTPS Response ``` ### Microservice Registration ```mermaid sequenceDiagram participant M as Microservice participant G as Gateway M->>G: TCP/TLS Connect M->>G: HELLO (ServiceName, Version, Endpoints) G->>G: Register Endpoints G-->>M: HELLO ACK loop Every 10s G->>M: HEARTBEAT M-->>G: HEARTBEAT (latency, health) G->>G: Update Health State end ``` --- ## 6) Instance Selection Algorithm ```csharp public ValueTask SelectInstanceAsync(string method, string path) { // 1. Find all endpoints matching (method, path) var candidates = _endpoints .Where(e => e.Method == method && MatchPath(e.Path, path)) .ToList(); // 2. Filter by health candidates = candidates .Where(c => c.Health is InstanceHealthStatus.Healthy or InstanceHealthStatus.Degraded) .ToList(); // 3. Region preference var localRegion = candidates.Where(c => c.Region == _config.Region).ToList(); var neighborRegions = candidates.Where(c => _config.NeighborRegions.Contains(c.Region)).ToList(); var otherRegions = candidates.Except(localRegion).Except(neighborRegions).ToList(); var preferred = localRegion.Any() ? localRegion : neighborRegions.Any() ? neighborRegions : otherRegions; // 4. Within tier: prefer lower latency, then most recent heartbeat return preferred .OrderBy(c => c.AveragePingMs) .ThenByDescending(c => c.LastHeartbeatUtc) .FirstOrDefault(); } ``` --- ## 7) Configuration ```yaml gateway: node: region: "eu1" nodeId: "gw-eu1-01" environment: "prod" transports: tcp: enabled: true port: 9100 maxConnections: 1000 receiveBufferSize: 65536 sendBufferSize: 65536 tls: enabled: true port: 9443 certificatePath: "/certs/gateway.pfx" certificatePassword: "${GATEWAY_CERT_PASSWORD}" clientCertificateMode: "RequireCertificate" allowedClientCertificateThumbprints: [] routing: defaultTimeout: "30s" maxRequestBodySize: "100MB" streamingEnabled: true streamingBufferSize: 16384 neighborRegions: ["eu2", "us1"] auth: dpopEnabled: true dpopMaxClockSkew: "60s" mtlsEnabled: true rateLimiting: enabled: true requestsPerMinute: 1000 burstSize: 100 redisConnectionString: "${REDIS_URL}" openapi: enabled: true cacheTtlSeconds: 300 title: "Stella Ops API" version: "1.0.0" health: heartbeatIntervalSeconds: 10 heartbeatTimeoutSeconds: 30 unhealthyThreshold: 3 ``` --- ## 8) Scale & Performance | Metric | Target | Notes | |--------|--------|-------| | Routing latency (P50) | <2ms | Overhead only; excludes downstream | | Routing latency (P99) | <5ms | Under normal load | | Concurrent connections | 10,000 | Per gateway instance | | Requests/second | 50,000 | Per gateway instance | | Memory footprint | <512MB | Base; scales with connections | ### Scaling Strategy - Horizontal scaling behind load balancer - Sticky sessions NOT required (stateless) - Regional deployment for latency optimization - Rate limiting via distributed Valkey/Redis --- ## 9) Security Posture ### Authentication | Method | Description | |--------|-------------| | DPoP | Proof-of-possession tokens from Authority | | mTLS | Certificate-bound tokens for machine clients | ### Authorization - Claims-based authorization per endpoint - Required claims defined in endpoint descriptors - Tenant isolation via `tid` claim ### Transport Security | Component | Encryption | |-----------|------------| | Client → Gateway | TLS 1.3 (HTTPS) | | Gateway → Microservices | TLS (prod), TCP (dev only) | ### Rate Limiting - Per-tenant: Configurable requests/minute - Per-identity: Burst protection - Global: Circuit breaker for overload --- ## 10) Observability & Audit ### Metrics (Prometheus) ``` gateway_requests_total{service,method,path,status} gateway_request_duration_seconds{service,method,path,quantile} gateway_active_connections{service} gateway_transport_frames_total{type} gateway_auth_failures_total{reason} gateway_rate_limit_exceeded_total{tenant} ``` ### Traces (OpenTelemetry) - Span per request: `gateway.route` - Child span: `gateway.auth.validate` - Child span: `gateway.transport.send` ### Logs (Structured) ```json { "timestamp": "2025-12-21T10:00:00Z", "level": "info", "message": "Request routed", "correlationId": "abc123", "tenantId": "tenant-1", "method": "GET", "path": "/api/v1/scans/xyz", "service": "scanner", "durationMs": 45, "status": 200 } ``` --- ## 11) Testing Matrix | Test Type | Scope | Coverage Target | |-----------|-------|-----------------| | Unit | Routing algorithm, auth validation | 90% | | Integration | Transport + routing flow | 80% | | E2E | Full request path with mock services | Key flows | | Performance | Latency, throughput, connection limits | SLO targets | | Chaos | Connection failures, microservice crashes | Resilience | ### Test Fixtures - `StellaOps.Router.Transport.InMemory` for transport mocking - Mock Authority for auth testing - `WebApplicationFactory` for integration tests --- ## 12) DevOps & Operations ### Deployment ```yaml # Kubernetes deployment excerpt apiVersion: apps/v1 kind: Deployment metadata: name: gateway spec: replicas: 3 template: spec: containers: - name: gateway image: stellaops/gateway:1.0.0 ports: - containerPort: 8080 # HTTPS - containerPort: 9443 # TLS (microservices) resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "1000m" livenessProbe: httpGet: path: /health/live port: 8080 readinessProbe: httpGet: path: /health/ready port: 8080 ``` ### SLOs | SLO | Target | Measurement | |-----|--------|-------------| | Availability | 99.9% | Uptime over 30 days | | Latency P99 | <50ms | Includes downstream | | Error rate | <0.1% | 5xx responses | --- ## 13) Roadmap | Feature | Sprint | Status | |---------|--------|--------| | Core implementation | 3600.0001.0001 | TODO | | WebSocket support | Future | Planned | | gRPC passthrough | Future | Planned | | GraphQL aggregation | Future | Exploration | --- ## 14) References - Router Architecture: `docs/modules/router/architecture.md` - OpenAPI Aggregation: `docs/modules/gateway/openapi.md` - Authority Integration: `docs/modules/authority/architecture.md` - Reference Architecture: `docs/product-advisories/archived/2025-12-21-reference-architecture/` --- **Last Updated**: 2025-12-21 (Sprint 3600)