- Implemented comprehensive tests for verdict artifact generation to ensure deterministic outputs across various scenarios, including identical inputs, parallel execution, and change ordering. - Created helper methods for generating sample verdict inputs and computing canonical hashes. - Added tests to validate the stability of canonical hashes, proof spine ordering, and summary statistics. - Introduced a new PowerShell script to update SHA256 sums for files, ensuring accurate hash generation and file integrity checks.
465 lines
14 KiB
Markdown
465 lines
14 KiB
Markdown
# component_architecture_gateway.md — **Stella Ops Gateway** (Sprint 3600)
|
|
|
|
> Derived from Reference Architecture Advisory and Router Architecture Specification
|
|
|
|
> **Scope.** The Gateway WebService is the single HTTP ingress point for all external traffic. It authenticates requests via Authority (DPoP/mTLS), routes to microservices via the Router binary protocol, aggregates OpenAPI specifications, and enforces tenant isolation.
|
|
> **Ownership:** Platform Guild
|
|
|
|
---
|
|
|
|
## 0) Mission & Boundaries
|
|
|
|
### What Gateway Does
|
|
|
|
- **HTTP Ingress**: Single entry point for all external HTTP/HTTPS traffic
|
|
- **Authentication**: DPoP and mTLS token validation via Authority integration
|
|
- **Routing**: Routes HTTP requests to microservices via binary protocol (TCP/TLS)
|
|
- **OpenAPI Aggregation**: Combines endpoint specs from all registered microservices
|
|
- **Health Aggregation**: Provides unified health status from downstream services
|
|
- **Rate Limiting**: Per-tenant and per-identity request throttling
|
|
- **Tenant Propagation**: Extracts tenant context and propagates to microservices
|
|
|
|
### What Gateway Does NOT Do
|
|
|
|
- **Business Logic**: No domain logic; pure routing and auth
|
|
- **Data Storage**: Stateless; no persistent state beyond connection cache
|
|
- **Direct Database Access**: Never connects to PostgreSQL directly
|
|
- **SBOM/VEX Processing**: Delegates to Scanner, Excititor, etc.
|
|
|
|
---
|
|
|
|
## 1) Solution & Project Layout
|
|
|
|
```
|
|
src/Gateway/
|
|
├── StellaOps.Gateway.WebService/
|
|
│ ├── StellaOps.Gateway.WebService.csproj
|
|
│ ├── Program.cs # DI bootstrap, transport init
|
|
│ ├── Dockerfile
|
|
│ ├── appsettings.json
|
|
│ ├── appsettings.Development.json
|
|
│ ├── Configuration/
|
|
│ │ ├── GatewayOptions.cs # All configuration options
|
|
│ │ └── TransportOptions.cs # TCP/TLS transport config
|
|
│ ├── Middleware/
|
|
│ │ ├── TenantMiddleware.cs # Tenant context extraction
|
|
│ │ ├── RequestRoutingMiddleware.cs # HTTP → binary routing
|
|
│ │ ├── AuthenticationMiddleware.cs # DPoP/mTLS validation
|
|
│ │ └── RateLimitingMiddleware.cs # Per-tenant throttling
|
|
│ ├── Services/
|
|
│ │ ├── GatewayHostedService.cs # Transport lifecycle
|
|
│ │ ├── OpenApiAggregationService.cs # Spec aggregation
|
|
│ │ └── HealthAggregationService.cs # Downstream health
|
|
│ └── Endpoints/
|
|
│ ├── HealthEndpoints.cs # /health/*, /metrics
|
|
│ └── OpenApiEndpoints.cs # /openapi.json, /openapi.yaml
|
|
```
|
|
|
|
### Dependencies
|
|
|
|
```xml
|
|
<ItemGroup>
|
|
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Gateway\..." />
|
|
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Transport.Tcp\..." />
|
|
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Transport.Tls\..." />
|
|
<ProjectReference Include="..\..\Auth\StellaOps.Auth.ServerIntegration\..." />
|
|
</ItemGroup>
|
|
```
|
|
|
|
---
|
|
|
|
## 2) External Dependencies
|
|
|
|
| Dependency | Purpose | Required |
|
|
|------------|---------|----------|
|
|
| **Authority** | OpTok validation, DPoP/mTLS | Yes |
|
|
| **Router.Gateway** | Routing state, endpoint discovery | Yes |
|
|
| **Router.Transport.Tcp** | Binary transport (dev) | Yes |
|
|
| **Router.Transport.Tls** | Binary transport (prod) | Yes |
|
|
| **Valkey/Redis** | Rate limiting state | Optional |
|
|
|
|
---
|
|
|
|
## 3) Contracts & Data Model
|
|
|
|
### Request Flow
|
|
|
|
```
|
|
┌──────────────┐ HTTPS ┌─────────────────┐ Binary ┌─────────────────┐
|
|
│ Client │ ─────────────► │ Gateway │ ────────────► │ Microservice │
|
|
│ (CLI/UI) │ │ WebService │ Frame │ (Scanner, │
|
|
│ │ ◄───────────── │ │ ◄──────────── │ Policy, etc) │
|
|
└──────────────┘ HTTPS └─────────────────┘ Binary └─────────────────┘
|
|
```
|
|
|
|
### Binary Frame Protocol
|
|
|
|
Gateway uses the Router binary protocol for internal communication:
|
|
|
|
| Frame Type | Purpose |
|
|
|------------|---------|
|
|
| HELLO | Microservice registration with endpoints |
|
|
| HEARTBEAT | Health check and latency measurement |
|
|
| REQUEST | HTTP request serialized to binary |
|
|
| RESPONSE | HTTP response serialized from binary |
|
|
| STREAM_DATA | Streaming response chunks |
|
|
| CANCEL | Request cancellation propagation |
|
|
|
|
### Endpoint Descriptor
|
|
|
|
```csharp
|
|
public sealed class EndpointDescriptor
|
|
{
|
|
public required string Method { get; init; } // GET, POST, etc.
|
|
public required string Path { get; init; } // /api/v1/scans/{id}
|
|
public required string ServiceName { get; init; } // scanner
|
|
public required string Version { get; init; } // 1.0.0
|
|
public TimeSpan DefaultTimeout { get; init; } // 30s
|
|
public bool SupportsStreaming { get; init; } // true for large responses
|
|
public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; }
|
|
}
|
|
```
|
|
|
|
### Routing State
|
|
|
|
```csharp
|
|
public interface IRoutingStateManager
|
|
{
|
|
ValueTask RegisterEndpointsAsync(ConnectionState conn, HelloPayload hello);
|
|
ValueTask<InstanceSelection?> SelectInstanceAsync(string method, string path);
|
|
ValueTask UpdateHealthAsync(ConnectionState conn, HeartbeatPayload heartbeat);
|
|
ValueTask DrainConnectionAsync(string connectionId);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4) REST API
|
|
|
|
Gateway exposes minimal management endpoints; all business APIs are routed to microservices.
|
|
|
|
### Health Endpoints
|
|
|
|
| Endpoint | Auth | Description |
|
|
|----------|------|-------------|
|
|
| `GET /health/live` | None | Liveness probe |
|
|
| `GET /health/ready` | None | Readiness probe |
|
|
| `GET /health/startup` | None | Startup probe |
|
|
| `GET /metrics` | None | Prometheus metrics |
|
|
|
|
### OpenAPI Endpoints
|
|
|
|
| Endpoint | Auth | Description |
|
|
|----------|------|-------------|
|
|
| `GET /openapi.json` | None | Aggregated OpenAPI 3.1.0 spec |
|
|
| `GET /openapi.yaml` | None | YAML format spec |
|
|
|
|
---
|
|
|
|
## 5) Execution Flow
|
|
|
|
### Request Routing
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant C as Client
|
|
participant G as Gateway
|
|
participant A as Authority
|
|
participant M as Microservice
|
|
|
|
C->>G: HTTPS Request + DPoP Token
|
|
G->>A: Validate Token
|
|
A-->>G: Claims (sub, tid, scope)
|
|
G->>G: Select Instance (Method, Path)
|
|
G->>M: Binary REQUEST Frame
|
|
M-->>G: Binary RESPONSE Frame
|
|
G-->>C: HTTPS Response
|
|
```
|
|
|
|
### Microservice Registration
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant M as Microservice
|
|
participant G as Gateway
|
|
|
|
M->>G: TCP/TLS Connect
|
|
M->>G: HELLO (ServiceName, Version, Endpoints)
|
|
G->>G: Register Endpoints
|
|
G-->>M: HELLO ACK
|
|
|
|
loop Every 10s
|
|
G->>M: HEARTBEAT
|
|
M-->>G: HEARTBEAT (latency, health)
|
|
G->>G: Update Health State
|
|
end
|
|
```
|
|
|
|
---
|
|
|
|
## 6) Instance Selection Algorithm
|
|
|
|
```csharp
|
|
public ValueTask<InstanceSelection?> SelectInstanceAsync(string method, string path)
|
|
{
|
|
// 1. Find all endpoints matching (method, path)
|
|
var candidates = _endpoints
|
|
.Where(e => e.Method == method && MatchPath(e.Path, path))
|
|
.ToList();
|
|
|
|
// 2. Filter by health
|
|
candidates = candidates
|
|
.Where(c => c.Health is InstanceHealthStatus.Healthy or InstanceHealthStatus.Degraded)
|
|
.ToList();
|
|
|
|
// 3. Region preference
|
|
var localRegion = candidates.Where(c => c.Region == _config.Region).ToList();
|
|
var neighborRegions = candidates.Where(c => _config.NeighborRegions.Contains(c.Region)).ToList();
|
|
var otherRegions = candidates.Except(localRegion).Except(neighborRegions).ToList();
|
|
|
|
var preferred = localRegion.Any() ? localRegion
|
|
: neighborRegions.Any() ? neighborRegions
|
|
: otherRegions;
|
|
|
|
// 4. Within tier: prefer lower latency, then most recent heartbeat
|
|
return preferred
|
|
.OrderBy(c => c.AveragePingMs)
|
|
.ThenByDescending(c => c.LastHeartbeatUtc)
|
|
.FirstOrDefault();
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 7) Configuration
|
|
|
|
```yaml
|
|
gateway:
|
|
node:
|
|
region: "eu1"
|
|
nodeId: "gw-eu1-01"
|
|
environment: "prod"
|
|
|
|
transports:
|
|
tcp:
|
|
enabled: true
|
|
port: 9100
|
|
maxConnections: 1000
|
|
receiveBufferSize: 65536
|
|
sendBufferSize: 65536
|
|
tls:
|
|
enabled: true
|
|
port: 9443
|
|
certificatePath: "/certs/gateway.pfx"
|
|
certificatePassword: "${GATEWAY_CERT_PASSWORD}"
|
|
clientCertificateMode: "RequireCertificate"
|
|
allowedClientCertificateThumbprints: []
|
|
|
|
routing:
|
|
defaultTimeout: "30s"
|
|
maxRequestBodySize: "100MB"
|
|
streamingEnabled: true
|
|
streamingBufferSize: 16384
|
|
neighborRegions: ["eu2", "us1"]
|
|
|
|
auth:
|
|
dpopEnabled: true
|
|
dpopMaxClockSkew: "60s"
|
|
mtlsEnabled: true
|
|
rateLimiting:
|
|
enabled: true
|
|
requestsPerMinute: 1000
|
|
burstSize: 100
|
|
redisConnectionString: "${REDIS_URL}"
|
|
|
|
openapi:
|
|
enabled: true
|
|
cacheTtlSeconds: 300
|
|
title: "Stella Ops API"
|
|
version: "1.0.0"
|
|
|
|
health:
|
|
heartbeatIntervalSeconds: 10
|
|
heartbeatTimeoutSeconds: 30
|
|
unhealthyThreshold: 3
|
|
```
|
|
|
|
---
|
|
|
|
## 8) Scale & Performance
|
|
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| Routing latency (P50) | <2ms | Overhead only; excludes downstream |
|
|
| Routing latency (P99) | <5ms | Under normal load |
|
|
| Concurrent connections | 10,000 | Per gateway instance |
|
|
| Requests/second | 50,000 | Per gateway instance |
|
|
| Memory footprint | <512MB | Base; scales with connections |
|
|
|
|
### Scaling Strategy
|
|
|
|
- Horizontal scaling behind load balancer
|
|
- Sticky sessions NOT required (stateless)
|
|
- Regional deployment for latency optimization
|
|
- Rate limiting via distributed Valkey/Redis
|
|
|
|
---
|
|
|
|
## 9) Security Posture
|
|
|
|
### Authentication
|
|
|
|
| Method | Description |
|
|
|--------|-------------|
|
|
| DPoP | Proof-of-possession tokens from Authority |
|
|
| mTLS | Certificate-bound tokens for machine clients |
|
|
|
|
### Authorization
|
|
|
|
- Claims-based authorization per endpoint
|
|
- Required claims defined in endpoint descriptors
|
|
- Tenant isolation via `tid` claim
|
|
|
|
### Transport Security
|
|
|
|
| Component | Encryption |
|
|
|-----------|------------|
|
|
| Client → Gateway | TLS 1.3 (HTTPS) |
|
|
| Gateway → Microservices | TLS (prod), TCP (dev only) |
|
|
|
|
### Rate Limiting
|
|
|
|
- Per-tenant: Configurable requests/minute
|
|
- Per-identity: Burst protection
|
|
- Global: Circuit breaker for overload
|
|
|
|
---
|
|
|
|
## 10) Observability & Audit
|
|
|
|
### Metrics (Prometheus)
|
|
|
|
```
|
|
gateway_requests_total{service,method,path,status}
|
|
gateway_request_duration_seconds{service,method,path,quantile}
|
|
gateway_active_connections{service}
|
|
gateway_transport_frames_total{type}
|
|
gateway_auth_failures_total{reason}
|
|
gateway_rate_limit_exceeded_total{tenant}
|
|
```
|
|
|
|
### Traces (OpenTelemetry)
|
|
|
|
- Span per request: `gateway.route`
|
|
- Child span: `gateway.auth.validate`
|
|
- Child span: `gateway.transport.send`
|
|
|
|
### Logs (Structured)
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-12-21T10:00:00Z",
|
|
"level": "info",
|
|
"message": "Request routed",
|
|
"correlationId": "abc123",
|
|
"tenantId": "tenant-1",
|
|
"method": "GET",
|
|
"path": "/api/v1/scans/xyz",
|
|
"service": "scanner",
|
|
"durationMs": 45,
|
|
"status": 200
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 11) Testing Matrix
|
|
|
|
| Test Type | Scope | Coverage Target |
|
|
|-----------|-------|-----------------|
|
|
| Unit | Routing algorithm, auth validation | 90% |
|
|
| Integration | Transport + routing flow | 80% |
|
|
| E2E | Full request path with mock services | Key flows |
|
|
| Performance | Latency, throughput, connection limits | SLO targets |
|
|
| Chaos | Connection failures, microservice crashes | Resilience |
|
|
|
|
### Test Fixtures
|
|
|
|
- `StellaOps.Router.Transport.InMemory` for transport mocking
|
|
- Mock Authority for auth testing
|
|
- `WebApplicationFactory` for integration tests
|
|
|
|
---
|
|
|
|
## 12) DevOps & Operations
|
|
|
|
### Deployment
|
|
|
|
```yaml
|
|
# Kubernetes deployment excerpt
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: gateway
|
|
spec:
|
|
replicas: 3
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: gateway
|
|
image: stellaops/gateway:1.0.0
|
|
ports:
|
|
- containerPort: 8080 # HTTPS
|
|
- containerPort: 9443 # TLS (microservices)
|
|
resources:
|
|
requests:
|
|
memory: "256Mi"
|
|
cpu: "250m"
|
|
limits:
|
|
memory: "512Mi"
|
|
cpu: "1000m"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health/live
|
|
port: 8080
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health/ready
|
|
port: 8080
|
|
```
|
|
|
|
### SLOs
|
|
|
|
| SLO | Target | Measurement |
|
|
|-----|--------|-------------|
|
|
| Availability | 99.9% | Uptime over 30 days |
|
|
| Latency P99 | <50ms | Includes downstream |
|
|
| Error rate | <0.1% | 5xx responses |
|
|
|
|
---
|
|
|
|
## 13) Roadmap
|
|
|
|
| Feature | Sprint | Status |
|
|
|---------|--------|--------|
|
|
| Core implementation | 3600.0001.0001 | TODO |
|
|
| WebSocket support | Future | Planned |
|
|
| gRPC passthrough | Future | Planned |
|
|
| GraphQL aggregation | Future | Exploration |
|
|
|
|
---
|
|
|
|
## 14) References
|
|
|
|
- Router Architecture: `docs/modules/router/architecture.md`
|
|
- Gateway Identity Header Policy: `docs/modules/gateway/identity-header-policy.md`
|
|
- OpenAPI Aggregation: `docs/modules/gateway/openapi.md`
|
|
- Router ASP.NET Endpoint Bridge: `docs/modules/router/aspnet-endpoint-bridge.md`
|
|
- Router Messaging (Valkey) Transport: `docs/modules/router/messaging-valkey-transport.md`
|
|
- Authority Integration: `docs/modules/authority/architecture.md`
|
|
- Reference Architecture: `docs/product-advisories/archived/2025-12-21-reference-architecture/`
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-21 (Sprint 3600)
|