- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
462 lines
13 KiB
Markdown
462 lines
13 KiB
Markdown
# component_architecture_gateway.md — **Stella Ops Gateway** (Sprint 3600)
|
|
|
|
> Derived from Reference Architecture Advisory and Router Architecture Specification
|
|
|
|
> **Scope.** The Gateway WebService is the single HTTP ingress point for all external traffic. It authenticates requests via Authority (DPoP/mTLS), routes to microservices via the Router binary protocol, aggregates OpenAPI specifications, and enforces tenant isolation.
|
|
> **Ownership:** Platform Guild
|
|
|
|
---
|
|
|
|
## 0) Mission & Boundaries
|
|
|
|
### What Gateway Does
|
|
|
|
- **HTTP Ingress**: Single entry point for all external HTTP/HTTPS traffic
|
|
- **Authentication**: DPoP and mTLS token validation via Authority integration
|
|
- **Routing**: Routes HTTP requests to microservices via binary protocol (TCP/TLS)
|
|
- **OpenAPI Aggregation**: Combines endpoint specs from all registered microservices
|
|
- **Health Aggregation**: Provides unified health status from downstream services
|
|
- **Rate Limiting**: Per-tenant and per-identity request throttling
|
|
- **Tenant Propagation**: Extracts tenant context and propagates to microservices
|
|
|
|
### What Gateway Does NOT Do
|
|
|
|
- **Business Logic**: No domain logic; pure routing and auth
|
|
- **Data Storage**: Stateless; no persistent state beyond connection cache
|
|
- **Direct Database Access**: Never connects to PostgreSQL directly
|
|
- **SBOM/VEX Processing**: Delegates to Scanner, Excititor, etc.
|
|
|
|
---
|
|
|
|
## 1) Solution & Project Layout
|
|
|
|
```
|
|
src/Gateway/
|
|
├── StellaOps.Gateway.WebService/
|
|
│ ├── StellaOps.Gateway.WebService.csproj
|
|
│ ├── Program.cs # DI bootstrap, transport init
|
|
│ ├── Dockerfile
|
|
│ ├── appsettings.json
|
|
│ ├── appsettings.Development.json
|
|
│ ├── Configuration/
|
|
│ │ ├── GatewayOptions.cs # All configuration options
|
|
│ │ └── TransportOptions.cs # TCP/TLS transport config
|
|
│ ├── Middleware/
|
|
│ │ ├── TenantMiddleware.cs # Tenant context extraction
|
|
│ │ ├── RequestRoutingMiddleware.cs # HTTP → binary routing
|
|
│ │ ├── AuthenticationMiddleware.cs # DPoP/mTLS validation
|
|
│ │ └── RateLimitingMiddleware.cs # Per-tenant throttling
|
|
│ ├── Services/
|
|
│ │ ├── GatewayHostedService.cs # Transport lifecycle
|
|
│ │ ├── OpenApiAggregationService.cs # Spec aggregation
|
|
│ │ └── HealthAggregationService.cs # Downstream health
|
|
│ └── Endpoints/
|
|
│ ├── HealthEndpoints.cs # /health/*, /metrics
|
|
│ └── OpenApiEndpoints.cs # /openapi.json, /openapi.yaml
|
|
```
|
|
|
|
### Dependencies
|
|
|
|
```xml
|
|
<ItemGroup>
|
|
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Gateway\..." />
|
|
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Transport.Tcp\..." />
|
|
<ProjectReference Include="..\..\__Libraries\StellaOps.Router.Transport.Tls\..." />
|
|
<ProjectReference Include="..\..\Auth\StellaOps.Auth.ServerIntegration\..." />
|
|
</ItemGroup>
|
|
```
|
|
|
|
---
|
|
|
|
## 2) External Dependencies
|
|
|
|
| Dependency | Purpose | Required |
|
|
|------------|---------|----------|
|
|
| **Authority** | OpTok validation, DPoP/mTLS | Yes |
|
|
| **Router.Gateway** | Routing state, endpoint discovery | Yes |
|
|
| **Router.Transport.Tcp** | Binary transport (dev) | Yes |
|
|
| **Router.Transport.Tls** | Binary transport (prod) | Yes |
|
|
| **Valkey/Redis** | Rate limiting state | Optional |
|
|
|
|
---
|
|
|
|
## 3) Contracts & Data Model
|
|
|
|
### Request Flow
|
|
|
|
```
|
|
┌──────────────┐ HTTPS ┌─────────────────┐ Binary ┌─────────────────┐
|
|
│ Client │ ─────────────► │ Gateway │ ────────────► │ Microservice │
|
|
│ (CLI/UI) │ │ WebService │ Frame │ (Scanner, │
|
|
│ │ ◄───────────── │ │ ◄──────────── │ Policy, etc) │
|
|
└──────────────┘ HTTPS └─────────────────┘ Binary └─────────────────┘
|
|
```
|
|
|
|
### Binary Frame Protocol
|
|
|
|
Gateway uses the Router binary protocol for internal communication:
|
|
|
|
| Frame Type | Purpose |
|
|
|------------|---------|
|
|
| HELLO | Microservice registration with endpoints |
|
|
| HEARTBEAT | Health check and latency measurement |
|
|
| REQUEST | HTTP request serialized to binary |
|
|
| RESPONSE | HTTP response serialized from binary |
|
|
| STREAM_DATA | Streaming response chunks |
|
|
| CANCEL | Request cancellation propagation |
|
|
|
|
### Endpoint Descriptor
|
|
|
|
```csharp
|
|
public sealed class EndpointDescriptor
|
|
{
|
|
public required string Method { get; init; } // GET, POST, etc.
|
|
public required string Path { get; init; } // /api/v1/scans/{id}
|
|
public required string ServiceName { get; init; } // scanner
|
|
public required string Version { get; init; } // 1.0.0
|
|
public TimeSpan DefaultTimeout { get; init; } // 30s
|
|
public bool SupportsStreaming { get; init; } // true for large responses
|
|
public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; }
|
|
}
|
|
```
|
|
|
|
### Routing State
|
|
|
|
```csharp
|
|
public interface IRoutingStateManager
|
|
{
|
|
ValueTask RegisterEndpointsAsync(ConnectionState conn, HelloPayload hello);
|
|
ValueTask<InstanceSelection?> SelectInstanceAsync(string method, string path);
|
|
ValueTask UpdateHealthAsync(ConnectionState conn, HeartbeatPayload heartbeat);
|
|
ValueTask DrainConnectionAsync(string connectionId);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4) REST API
|
|
|
|
Gateway exposes minimal management endpoints; all business APIs are routed to microservices.
|
|
|
|
### Health Endpoints
|
|
|
|
| Endpoint | Auth | Description |
|
|
|----------|------|-------------|
|
|
| `GET /health/live` | None | Liveness probe |
|
|
| `GET /health/ready` | None | Readiness probe |
|
|
| `GET /health/startup` | None | Startup probe |
|
|
| `GET /metrics` | None | Prometheus metrics |
|
|
|
|
### OpenAPI Endpoints
|
|
|
|
| Endpoint | Auth | Description |
|
|
|----------|------|-------------|
|
|
| `GET /openapi.json` | None | Aggregated OpenAPI 3.1.0 spec |
|
|
| `GET /openapi.yaml` | None | YAML format spec |
|
|
|
|
---
|
|
|
|
## 5) Execution Flow
|
|
|
|
### Request Routing
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant C as Client
|
|
participant G as Gateway
|
|
participant A as Authority
|
|
participant M as Microservice
|
|
|
|
C->>G: HTTPS Request + DPoP Token
|
|
G->>A: Validate Token
|
|
A-->>G: Claims (sub, tid, scope)
|
|
G->>G: Select Instance (Method, Path)
|
|
G->>M: Binary REQUEST Frame
|
|
M-->>G: Binary RESPONSE Frame
|
|
G-->>C: HTTPS Response
|
|
```
|
|
|
|
### Microservice Registration
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant M as Microservice
|
|
participant G as Gateway
|
|
|
|
M->>G: TCP/TLS Connect
|
|
M->>G: HELLO (ServiceName, Version, Endpoints)
|
|
G->>G: Register Endpoints
|
|
G-->>M: HELLO ACK
|
|
|
|
loop Every 10s
|
|
G->>M: HEARTBEAT
|
|
M-->>G: HEARTBEAT (latency, health)
|
|
G->>G: Update Health State
|
|
end
|
|
```
|
|
|
|
---
|
|
|
|
## 6) Instance Selection Algorithm
|
|
|
|
```csharp
|
|
public ValueTask<InstanceSelection?> SelectInstanceAsync(string method, string path)
|
|
{
|
|
// 1. Find all endpoints matching (method, path)
|
|
var candidates = _endpoints
|
|
.Where(e => e.Method == method && MatchPath(e.Path, path))
|
|
.ToList();
|
|
|
|
// 2. Filter by health
|
|
candidates = candidates
|
|
.Where(c => c.Health is InstanceHealthStatus.Healthy or InstanceHealthStatus.Degraded)
|
|
.ToList();
|
|
|
|
// 3. Region preference
|
|
var localRegion = candidates.Where(c => c.Region == _config.Region).ToList();
|
|
var neighborRegions = candidates.Where(c => _config.NeighborRegions.Contains(c.Region)).ToList();
|
|
var otherRegions = candidates.Except(localRegion).Except(neighborRegions).ToList();
|
|
|
|
var preferred = localRegion.Any() ? localRegion
|
|
: neighborRegions.Any() ? neighborRegions
|
|
: otherRegions;
|
|
|
|
// 4. Within tier: prefer lower latency, then most recent heartbeat
|
|
return preferred
|
|
.OrderBy(c => c.AveragePingMs)
|
|
.ThenByDescending(c => c.LastHeartbeatUtc)
|
|
.FirstOrDefault();
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 7) Configuration
|
|
|
|
```yaml
|
|
gateway:
|
|
node:
|
|
region: "eu1"
|
|
nodeId: "gw-eu1-01"
|
|
environment: "prod"
|
|
|
|
transports:
|
|
tcp:
|
|
enabled: true
|
|
port: 9100
|
|
maxConnections: 1000
|
|
receiveBufferSize: 65536
|
|
sendBufferSize: 65536
|
|
tls:
|
|
enabled: true
|
|
port: 9443
|
|
certificatePath: "/certs/gateway.pfx"
|
|
certificatePassword: "${GATEWAY_CERT_PASSWORD}"
|
|
clientCertificateMode: "RequireCertificate"
|
|
allowedClientCertificateThumbprints: []
|
|
|
|
routing:
|
|
defaultTimeout: "30s"
|
|
maxRequestBodySize: "100MB"
|
|
streamingEnabled: true
|
|
streamingBufferSize: 16384
|
|
neighborRegions: ["eu2", "us1"]
|
|
|
|
auth:
|
|
dpopEnabled: true
|
|
dpopMaxClockSkew: "60s"
|
|
mtlsEnabled: true
|
|
rateLimiting:
|
|
enabled: true
|
|
requestsPerMinute: 1000
|
|
burstSize: 100
|
|
redisConnectionString: "${REDIS_URL}"
|
|
|
|
openapi:
|
|
enabled: true
|
|
cacheTtlSeconds: 300
|
|
title: "Stella Ops API"
|
|
version: "1.0.0"
|
|
|
|
health:
|
|
heartbeatIntervalSeconds: 10
|
|
heartbeatTimeoutSeconds: 30
|
|
unhealthyThreshold: 3
|
|
```
|
|
|
|
---
|
|
|
|
## 8) Scale & Performance
|
|
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| Routing latency (P50) | <2ms | Overhead only; excludes downstream |
|
|
| Routing latency (P99) | <5ms | Under normal load |
|
|
| Concurrent connections | 10,000 | Per gateway instance |
|
|
| Requests/second | 50,000 | Per gateway instance |
|
|
| Memory footprint | <512MB | Base; scales with connections |
|
|
|
|
### Scaling Strategy
|
|
|
|
- Horizontal scaling behind load balancer
|
|
- Sticky sessions NOT required (stateless)
|
|
- Regional deployment for latency optimization
|
|
- Rate limiting via distributed Valkey/Redis
|
|
|
|
---
|
|
|
|
## 9) Security Posture
|
|
|
|
### Authentication
|
|
|
|
| Method | Description |
|
|
|--------|-------------|
|
|
| DPoP | Proof-of-possession tokens from Authority |
|
|
| mTLS | Certificate-bound tokens for machine clients |
|
|
|
|
### Authorization
|
|
|
|
- Claims-based authorization per endpoint
|
|
- Required claims defined in endpoint descriptors
|
|
- Tenant isolation via `tid` claim
|
|
|
|
### Transport Security
|
|
|
|
| Component | Encryption |
|
|
|-----------|------------|
|
|
| Client → Gateway | TLS 1.3 (HTTPS) |
|
|
| Gateway → Microservices | TLS (prod), TCP (dev only) |
|
|
|
|
### Rate Limiting
|
|
|
|
- Per-tenant: Configurable requests/minute
|
|
- Per-identity: Burst protection
|
|
- Global: Circuit breaker for overload
|
|
|
|
---
|
|
|
|
## 10) Observability & Audit
|
|
|
|
### Metrics (Prometheus)
|
|
|
|
```
|
|
gateway_requests_total{service,method,path,status}
|
|
gateway_request_duration_seconds{service,method,path,quantile}
|
|
gateway_active_connections{service}
|
|
gateway_transport_frames_total{type}
|
|
gateway_auth_failures_total{reason}
|
|
gateway_rate_limit_exceeded_total{tenant}
|
|
```
|
|
|
|
### Traces (OpenTelemetry)
|
|
|
|
- Span per request: `gateway.route`
|
|
- Child span: `gateway.auth.validate`
|
|
- Child span: `gateway.transport.send`
|
|
|
|
### Logs (Structured)
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-12-21T10:00:00Z",
|
|
"level": "info",
|
|
"message": "Request routed",
|
|
"correlationId": "abc123",
|
|
"tenantId": "tenant-1",
|
|
"method": "GET",
|
|
"path": "/api/v1/scans/xyz",
|
|
"service": "scanner",
|
|
"durationMs": 45,
|
|
"status": 200
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 11) Testing Matrix
|
|
|
|
| Test Type | Scope | Coverage Target |
|
|
|-----------|-------|-----------------|
|
|
| Unit | Routing algorithm, auth validation | 90% |
|
|
| Integration | Transport + routing flow | 80% |
|
|
| E2E | Full request path with mock services | Key flows |
|
|
| Performance | Latency, throughput, connection limits | SLO targets |
|
|
| Chaos | Connection failures, microservice crashes | Resilience |
|
|
|
|
### Test Fixtures
|
|
|
|
- `StellaOps.Router.Transport.InMemory` for transport mocking
|
|
- Mock Authority for auth testing
|
|
- `WebApplicationFactory` for integration tests
|
|
|
|
---
|
|
|
|
## 12) DevOps & Operations
|
|
|
|
### Deployment
|
|
|
|
```yaml
|
|
# Kubernetes deployment excerpt
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: gateway
|
|
spec:
|
|
replicas: 3
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: gateway
|
|
image: stellaops/gateway:1.0.0
|
|
ports:
|
|
- containerPort: 8080 # HTTPS
|
|
- containerPort: 9443 # TLS (microservices)
|
|
resources:
|
|
requests:
|
|
memory: "256Mi"
|
|
cpu: "250m"
|
|
limits:
|
|
memory: "512Mi"
|
|
cpu: "1000m"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health/live
|
|
port: 8080
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health/ready
|
|
port: 8080
|
|
```
|
|
|
|
### SLOs
|
|
|
|
| SLO | Target | Measurement |
|
|
|-----|--------|-------------|
|
|
| Availability | 99.9% | Uptime over 30 days |
|
|
| Latency P99 | <50ms | Includes downstream |
|
|
| Error rate | <0.1% | 5xx responses |
|
|
|
|
---
|
|
|
|
## 13) Roadmap
|
|
|
|
| Feature | Sprint | Status |
|
|
|---------|--------|--------|
|
|
| Core implementation | 3600.0001.0001 | TODO |
|
|
| WebSocket support | Future | Planned |
|
|
| gRPC passthrough | Future | Planned |
|
|
| GraphQL aggregation | Future | Exploration |
|
|
|
|
---
|
|
|
|
## 14) References
|
|
|
|
- Router Architecture: `docs/modules/router/architecture.md`
|
|
- OpenAPI Aggregation: `docs/modules/gateway/openapi.md`
|
|
- Authority Integration: `docs/modules/authority/architecture.md`
|
|
- Reference Architecture: `docs/product-advisories/archived/2025-12-21-reference-architecture/`
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-21 (Sprint 3600)
|