263 lines
7.9 KiB
Markdown
263 lines
7.9 KiB
Markdown
# Tile-Proxy Service Design
|
|
|
|
## Overview
|
|
|
|
The Tile-Proxy service acts as an intermediary between StellaOps clients and upstream Rekor transparency log APIs. It provides centralized tile caching, request coalescing, and offline support for air-gapped environments.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ CI/CD Agents │────►│ Tile Proxy │────►│ Rekor API │
|
|
│ (StellaOps) │ │ (StellaOps) │ │ (Upstream) │
|
|
└─────────────────┘ └────────┬────────┘ └─────────────────┘
|
|
│
|
|
┌───────────────────────┼───────────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ Tile Cache │ │ TUF Metadata │ │ Checkpoint │
|
|
│ (CAS Store) │ │ (TrustRepo) │ │ Cache │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
## Core Responsibilities
|
|
|
|
1. **Tile Proxying**: Forward tile requests to upstream Rekor, caching responses locally
|
|
2. **Content-Addressed Storage**: Store tiles by hash for deduplication and immutability
|
|
3. **TUF Integration**: Optionally validate metadata using TUF trust anchors
|
|
4. **Request Coalescing**: Deduplicate concurrent requests for the same tile
|
|
5. **Checkpoint Caching**: Cache and serve recent checkpoints
|
|
6. **Offline Mode**: Serve from cache when upstream is unavailable
|
|
|
|
## API Surface
|
|
|
|
### Proxy Endpoints (Passthrough)
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /tile/{level}/{index}` | Proxy tile request (cache-through) |
|
|
| `GET /tile/{level}/{index}.p/{partialWidth}` | Proxy partial tile |
|
|
| `GET /checkpoint` | Proxy checkpoint request |
|
|
| `GET /api/v1/log/entries/{uuid}` | Proxy entry lookup |
|
|
|
|
### Admin Endpoints
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /_admin/cache/stats` | Cache statistics (hits, misses, size) |
|
|
| `POST /_admin/cache/sync` | Trigger manual sync job |
|
|
| `DELETE /_admin/cache/prune` | Prune old tiles |
|
|
| `GET /_admin/health` | Health check |
|
|
| `GET /_admin/ready` | Readiness check |
|
|
|
|
## Caching Strategy
|
|
|
|
### Content-Addressed Tile Storage
|
|
|
|
Tiles are stored using content-addressed paths based on SHA-256 hash:
|
|
|
|
```
|
|
{cache_root}/
|
|
├── tiles/
|
|
│ ├── {origin_hash}/
|
|
│ │ ├── {level}/
|
|
│ │ │ ├── {index}.tile
|
|
│ │ │ └── {index}.meta.json
|
|
│ │ └── checkpoints/
|
|
│ │ └── {tree_size}.checkpoint
|
|
│ └── ...
|
|
└── metadata/
|
|
└── cache_stats.json
|
|
```
|
|
|
|
### Tile Metadata
|
|
|
|
Each tile has associated metadata:
|
|
|
|
```json
|
|
{
|
|
"cachedAt": "2026-01-25T10:00:00Z",
|
|
"treeSize": 1050000,
|
|
"isPartial": false,
|
|
"contentHash": "sha256:abc123...",
|
|
"upstreamUrl": "https://rekor.sigstore.dev"
|
|
}
|
|
```
|
|
|
|
### Eviction Policy
|
|
|
|
1. **LRU by Access Time**: Least recently accessed tiles evicted first
|
|
2. **Max Size Limit**: Configurable maximum cache size
|
|
3. **TTL Override**: Force re-fetch after configurable time (for checkpoints)
|
|
4. **Immutability Preservation**: Full tiles (width=256) never evicted unless explicitly pruned
|
|
|
|
## Request Coalescing
|
|
|
|
Concurrent requests for the same tile are coalesced:
|
|
|
|
```csharp
|
|
// Pseudo-code for request coalescing
|
|
var key = $"{origin}/{level}/{index}";
|
|
if (_inflightRequests.TryGetValue(key, out var existing))
|
|
{
|
|
return await existing; // Wait for in-flight request
|
|
}
|
|
|
|
var tcs = new TaskCompletionSource<byte[]>();
|
|
_inflightRequests[key] = tcs.Task;
|
|
try
|
|
{
|
|
var tile = await FetchFromUpstream(origin, level, index);
|
|
tcs.SetResult(tile);
|
|
return tile;
|
|
}
|
|
finally
|
|
{
|
|
_inflightRequests.Remove(key);
|
|
}
|
|
```
|
|
|
|
## TUF Integration Point
|
|
|
|
When `TufValidationEnabled` is true:
|
|
|
|
1. Load service map from TUF to discover Rekor URL
|
|
2. Validate Rekor public key from TUF targets
|
|
3. Verify checkpoint signatures using TUF-loaded keys
|
|
4. Reject tiles if checkpoint signature invalid
|
|
|
|
## Upstream Failover
|
|
|
|
Support multiple upstream sources with failover:
|
|
|
|
```yaml
|
|
tile_proxy:
|
|
upstreams:
|
|
- url: https://rekor.sigstore.dev
|
|
priority: 1
|
|
timeout: 30s
|
|
- url: https://rekor-mirror.internal
|
|
priority: 2
|
|
timeout: 10s
|
|
```
|
|
|
|
Failover behavior:
|
|
1. Try primary upstream first
|
|
2. On timeout/error, try next upstream
|
|
3. Cache successful source for subsequent requests
|
|
4. Reset failover state on explicit refresh
|
|
|
|
## Deployment Model
|
|
|
|
### Standalone Service
|
|
|
|
Run as dedicated service with persistent volume:
|
|
|
|
```yaml
|
|
services:
|
|
tile-proxy:
|
|
image: stellaops/tile-proxy:latest
|
|
ports:
|
|
- "8090:8080"
|
|
volumes:
|
|
- tile-cache:/var/cache/stellaops/tiles
|
|
- tuf-cache:/var/cache/stellaops/tuf
|
|
environment:
|
|
- TILE_PROXY__UPSTREAM_URL=https://rekor.sigstore.dev
|
|
- TILE_PROXY__TUF_URL=https://trust.stella-ops.org/tuf/
|
|
```
|
|
|
|
### Sidecar Mode
|
|
|
|
Run alongside attestor service:
|
|
|
|
```yaml
|
|
services:
|
|
attestor:
|
|
image: stellaops/attestor:latest
|
|
environment:
|
|
- ATTESTOR__REKOR_URL=http://localhost:8090 # Use sidecar
|
|
|
|
tile-proxy:
|
|
image: stellaops/tile-proxy:latest
|
|
network_mode: "service:attestor"
|
|
```
|
|
|
|
## Metrics
|
|
|
|
Prometheus metrics exposed at `/_admin/metrics`:
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `tile_proxy_cache_hits_total` | Counter | Total cache hits |
|
|
| `tile_proxy_cache_misses_total` | Counter | Total cache misses |
|
|
| `tile_proxy_cache_size_bytes` | Gauge | Current cache size |
|
|
| `tile_proxy_upstream_requests_total` | Counter | Upstream requests by status |
|
|
| `tile_proxy_request_duration_seconds` | Histogram | Request latency |
|
|
| `tile_proxy_sync_last_success_timestamp` | Gauge | Last successful sync time |
|
|
|
|
## Configuration
|
|
|
|
```yaml
|
|
tile_proxy:
|
|
# Upstream Rekor configuration
|
|
upstream_url: https://rekor.sigstore.dev
|
|
tile_base_url: https://rekor.sigstore.dev/tile/
|
|
|
|
# TUF integration (optional)
|
|
tuf:
|
|
enabled: true
|
|
url: https://trust.stella-ops.org/tuf/
|
|
validate_checkpoint_signature: true
|
|
|
|
# Cache configuration
|
|
cache:
|
|
base_path: /var/cache/stellaops/tiles
|
|
max_size_gb: 10
|
|
eviction_policy: lru
|
|
checkpoint_ttl_minutes: 5
|
|
|
|
# Sync job configuration
|
|
sync:
|
|
enabled: true
|
|
schedule: "0 */6 * * *"
|
|
depth: 10000
|
|
|
|
# Request handling
|
|
coalescing:
|
|
enabled: true
|
|
max_wait_ms: 5000
|
|
|
|
# Failover
|
|
failover:
|
|
enabled: true
|
|
retry_count: 2
|
|
retry_delay_ms: 1000
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **No Authentication by Default**: Designed for internal network use
|
|
2. **Optional mTLS**: Can enable client certificate validation
|
|
3. **Rate Limiting**: Optional rate limiting per client IP
|
|
4. **Audit Logging**: Log all cache operations for compliance
|
|
5. **Immutable Tiles**: Full tiles are never modified after caching
|
|
|
|
## Error Handling
|
|
|
|
| Scenario | Behavior |
|
|
|----------|----------|
|
|
| Upstream unavailable | Serve from cache if available; 503 otherwise |
|
|
| Invalid tile data | Reject, don't cache, log error |
|
|
| Cache full | Evict LRU tiles, continue serving |
|
|
| TUF validation fails | Reject request, return 502 |
|
|
| Checkpoint stale | Refresh from upstream, warn in logs |
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Tile Prefetching**: Prefetch tiles for known verification patterns
|
|
2. **Multi-Log Support**: Support multiple transparency logs
|
|
3. **Replication**: Sync cache between proxy instances
|
|
4. **Compression**: Optional tile compression for storage
|