# Tile-Proxy Service Design ## Overview The Tile-Proxy service acts as an intermediary between StellaOps clients and upstream Rekor transparency log APIs. It provides centralized tile caching, request coalescing, and offline support for air-gapped environments. ## Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ CI/CD Agents │────►│ Tile Proxy │────►│ Rekor API │ │ (StellaOps) │ │ (StellaOps) │ │ (Upstream) │ └─────────────────┘ └────────┬────────┘ └─────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Tile Cache │ │ TUF Metadata │ │ Checkpoint │ │ (CAS Store) │ │ (TrustRepo) │ │ Cache │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ## Core Responsibilities 1. **Tile Proxying**: Forward tile requests to upstream Rekor, caching responses locally 2. **Content-Addressed Storage**: Store tiles by hash for deduplication and immutability 3. **TUF Integration**: Optionally validate metadata using TUF trust anchors 4. **Request Coalescing**: Deduplicate concurrent requests for the same tile 5. **Checkpoint Caching**: Cache and serve recent checkpoints 6. **Offline Mode**: Serve from cache when upstream is unavailable ## API Surface ### Proxy Endpoints (Passthrough) | Endpoint | Description | |----------|-------------| | `GET /tile/{level}/{index}` | Proxy tile request (cache-through) | | `GET /tile/{level}/{index}.p/{partialWidth}` | Proxy partial tile | | `GET /checkpoint` | Proxy checkpoint request | | `GET /api/v1/log/entries/{uuid}` | Proxy entry lookup | ### Admin Endpoints | Endpoint | Description | |----------|-------------| | `GET /_admin/cache/stats` | Cache statistics (hits, misses, size) | | `POST /_admin/cache/sync` | Trigger manual sync job | | `DELETE /_admin/cache/prune` | Prune old tiles | | `GET /_admin/health` | Health check | | `GET /_admin/ready` | Readiness check | ## Caching Strategy ### Content-Addressed Tile Storage Tiles are stored using content-addressed paths based on SHA-256 hash: ``` {cache_root}/ ├── tiles/ │ ├── {origin_hash}/ │ │ ├── {level}/ │ │ │ ├── {index}.tile │ │ │ └── {index}.meta.json │ │ └── checkpoints/ │ │ └── {tree_size}.checkpoint │ └── ... └── metadata/ └── cache_stats.json ``` ### Tile Metadata Each tile has associated metadata: ```json { "cachedAt": "2026-01-25T10:00:00Z", "treeSize": 1050000, "isPartial": false, "contentHash": "sha256:abc123...", "upstreamUrl": "https://rekor.sigstore.dev" } ``` ### Eviction Policy 1. **LRU by Access Time**: Least recently accessed tiles evicted first 2. **Max Size Limit**: Configurable maximum cache size 3. **TTL Override**: Force re-fetch after configurable time (for checkpoints) 4. **Immutability Preservation**: Full tiles (width=256) never evicted unless explicitly pruned ## Request Coalescing Concurrent requests for the same tile are coalesced: ```csharp // Pseudo-code for request coalescing var key = $"{origin}/{level}/{index}"; if (_inflightRequests.TryGetValue(key, out var existing)) { return await existing; // Wait for in-flight request } var tcs = new TaskCompletionSource(); _inflightRequests[key] = tcs.Task; try { var tile = await FetchFromUpstream(origin, level, index); tcs.SetResult(tile); return tile; } finally { _inflightRequests.Remove(key); } ``` ## TUF Integration Point When `TufValidationEnabled` is true: 1. Load service map from TUF to discover Rekor URL 2. Validate Rekor public key from TUF targets 3. Verify checkpoint signatures using TUF-loaded keys 4. Reject tiles if checkpoint signature invalid ## Upstream Failover Support multiple upstream sources with failover: ```yaml tile_proxy: upstreams: - url: https://rekor.sigstore.dev priority: 1 timeout: 30s - url: https://rekor-mirror.internal priority: 2 timeout: 10s ``` Failover behavior: 1. Try primary upstream first 2. On timeout/error, try next upstream 3. Cache successful source for subsequent requests 4. Reset failover state on explicit refresh ## Deployment Model ### Standalone Service Run as dedicated service with persistent volume: ```yaml services: tile-proxy: image: stellaops/tile-proxy:latest ports: - "8090:8080" volumes: - tile-cache:/var/cache/stellaops/tiles - tuf-cache:/var/cache/stellaops/tuf environment: - TILE_PROXY__UPSTREAM_URL=https://rekor.sigstore.dev - TILE_PROXY__TUF_URL=https://trust.stella-ops.org/tuf/ ``` ### Sidecar Mode Run alongside attestor service: ```yaml services: attestor: image: stellaops/attestor:latest environment: - ATTESTOR__REKOR_URL=http://localhost:8090 # Use sidecar tile-proxy: image: stellaops/tile-proxy:latest network_mode: "service:attestor" ``` ## Metrics Prometheus metrics exposed at `/_admin/metrics`: | Metric | Type | Description | |--------|------|-------------| | `tile_proxy_cache_hits_total` | Counter | Total cache hits | | `tile_proxy_cache_misses_total` | Counter | Total cache misses | | `tile_proxy_cache_size_bytes` | Gauge | Current cache size | | `tile_proxy_upstream_requests_total` | Counter | Upstream requests by status | | `tile_proxy_request_duration_seconds` | Histogram | Request latency | | `tile_proxy_sync_last_success_timestamp` | Gauge | Last successful sync time | ## Configuration ```yaml tile_proxy: # Upstream Rekor configuration upstream_url: https://rekor.sigstore.dev tile_base_url: https://rekor.sigstore.dev/tile/ # TUF integration (optional) tuf: enabled: true url: https://trust.stella-ops.org/tuf/ validate_checkpoint_signature: true # Cache configuration cache: base_path: /var/cache/stellaops/tiles max_size_gb: 10 eviction_policy: lru checkpoint_ttl_minutes: 5 # Sync job configuration sync: enabled: true schedule: "0 */6 * * *" depth: 10000 # Request handling coalescing: enabled: true max_wait_ms: 5000 # Failover failover: enabled: true retry_count: 2 retry_delay_ms: 1000 ``` ## Security Considerations 1. **No Authentication by Default**: Designed for internal network use 2. **Optional mTLS**: Can enable client certificate validation 3. **Rate Limiting**: Optional rate limiting per client IP 4. **Audit Logging**: Log all cache operations for compliance 5. **Immutable Tiles**: Full tiles are never modified after caching ## Error Handling | Scenario | Behavior | |----------|----------| | Upstream unavailable | Serve from cache if available; 503 otherwise | | Invalid tile data | Reject, don't cache, log error | | Cache full | Evict LRU tiles, continue serving | | TUF validation fails | Reject request, return 502 | | Checkpoint stale | Refresh from upstream, warn in logs | ## Future Enhancements 1. **Tile Prefetching**: Prefetch tiles for known verification patterns 2. **Multi-Log Support**: Support multiple transparency logs 3. **Replication**: Sync cache between proxy instances 4. **Compression**: Optional tile compression for storage